feat: enable document orientation detection for scanned PDFs
- Enable PP-StructureV3's use_doc_orientation_classify feature - Detect rotation angle from doc_preprocessor_res.angle - Swap page dimensions (width <-> height) for 90°/270° rotations - Output PDF now correctly displays landscape-scanned content Also includes: - Archive completed openspec proposals - Add simplify-frontend-ocr-config proposal (pending) - Code cleanup and frontend simplification 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
25
openspec/changes/simplify-frontend-ocr-config/proposal.md
Normal file
25
openspec/changes/simplify-frontend-ocr-config/proposal.md
Normal file
@@ -0,0 +1,25 @@
|
||||
# Change: 簡化前端 OCR 配置選項
|
||||
|
||||
## Why
|
||||
OCR track 已改為使用 simple OCR 模式,不再需要前端的複雜配置選項(如表格偵測模式、OCR 預設、進階參數等)。這些配置增加了使用者的認知負擔,且不再影響實際處理結果。
|
||||
|
||||
## What Changes
|
||||
- **BREAKING** 移除前端的 OCR 處理預設選擇器 (`OCRPresetSelector`)
|
||||
- **BREAKING** 移除前端的表格偵測配置選擇器 (`TableDetectionSelector`)
|
||||
- **BREAKING** 移除前端相關的 TypeScript 類型定義 (`OCRPreset`, `OCRConfig`, `TableDetectionConfig`, `TableParsingMode` 等)
|
||||
- 保留版面模型選擇功能 (`LayoutModelSelector`): `chinese | default | cdla`
|
||||
- 保留影像前處理配置功能 (`PreprocessingSettings`): auto/manual/disabled 模式及相關參數
|
||||
- 簡化後端 API 的 `ProcessingOptions`,移除不再使用的參數
|
||||
|
||||
## Impact
|
||||
- Affected specs: `ocr-processing`
|
||||
- Affected code:
|
||||
- **前端需刪除的檔案**:
|
||||
- `frontend/src/components/OCRPresetSelector.tsx`
|
||||
- `frontend/src/components/TableDetectionSelector.tsx`
|
||||
- **前端需修改的檔案**:
|
||||
- `frontend/src/types/apiV2.ts` - 移除未使用的類型定義
|
||||
- `frontend/src/pages/ProcessingPage.tsx` - 移除已註解的相關 import 和邏輯
|
||||
- **後端需修改的檔案**:
|
||||
- `backend/app/schemas/task.py` - 移除 `ProcessingOptions` 中的 `ocr_preset`, `ocr_config`, `table_detection` 欄位
|
||||
- `backend/app/routers/tasks.py` - 清理對應的參數處理邏輯
|
||||
@@ -0,0 +1,127 @@
|
||||
# ocr-processing Specification Delta
|
||||
|
||||
## REMOVED Requirements
|
||||
|
||||
### Requirement: OCR Preset Selection
|
||||
**Reason**: OCR track 已改為 simple OCR 模式,不再需要前端提供複雜的預設配置。後端統一使用預設參數處理。
|
||||
**Migration**: 移除前端 `OCRPresetSelector` 組件及相關類型定義。後端自動使用最佳預設配置。
|
||||
|
||||
### Requirement: Table Detection Configuration
|
||||
**Reason**: 表格偵測設定(有框線/無框線表格開關、區域偵測開關)不再需要由前端控制。後端統一使用預設的表格偵測策略。
|
||||
**Migration**: 移除前端 `TableDetectionSelector` 組件及 `TableDetectionConfig` 類型。後端使用內建預設值。
|
||||
|
||||
### Requirement: OCR Advanced Parameters
|
||||
**Reason**: 進階 OCR 參數(如 `table_parsing_mode`, `layout_threshold`, `enable_chart_recognition` 等)不再需要前端配置。
|
||||
**Migration**: 移除前端 `OCRConfig` 類型及相關 UI。後端固定使用 simple OCR 模式的預設參數。
|
||||
|
||||
## MODIFIED Requirements
|
||||
|
||||
### Requirement: Layout Model Selection
|
||||
The system SHALL allow users to select a layout detection model optimized for their document type, providing a simple choice between pre-configured models instead of manual parameter tuning.
|
||||
|
||||
#### Scenario: User selects Chinese document model
|
||||
- **GIVEN** a user is processing Chinese business documents (forms, contracts, invoices)
|
||||
- **WHEN** the user selects "Chinese Document Model" (PP-DocLayout-S)
|
||||
- **THEN** the OCR engine SHALL use the PP-DocLayout-S layout detection model
|
||||
- **AND** the model SHALL be optimized for 23 Chinese document element types
|
||||
- **AND** table and form detection accuracy SHALL be improved over the default model
|
||||
|
||||
#### Scenario: User selects standard model for English documents
|
||||
- **GIVEN** a user is processing English academic papers or reports
|
||||
- **WHEN** the user selects "Standard Model" (PubLayNet-based)
|
||||
- **THEN** the OCR engine SHALL use the default PubLayNet-based layout detection model
|
||||
- **AND** the model SHALL be optimized for English document layouts
|
||||
|
||||
#### Scenario: User selects CDLA model for specialized Chinese layout
|
||||
- **GIVEN** a user is processing Chinese documents with complex layouts
|
||||
- **WHEN** the user selects "CDLA Model"
|
||||
- **THEN** the OCR engine SHALL use the picodet_lcnet_x1_0_fgd_layout_cdla model
|
||||
- **AND** the model SHALL provide specialized Chinese document layout analysis
|
||||
|
||||
#### Scenario: Layout model is sent via API request
|
||||
- **GIVEN** a frontend application with model selection UI
|
||||
- **WHEN** the user starts task processing with a selected model
|
||||
- **THEN** the frontend SHALL send the model choice in the request body:
|
||||
```json
|
||||
POST /api/v2/tasks/{task_id}/start
|
||||
{
|
||||
"use_dual_track": true,
|
||||
"force_track": "ocr",
|
||||
"language": "ch",
|
||||
"layout_model": "chinese"
|
||||
}
|
||||
```
|
||||
- **AND** the backend SHALL configure PP-StructureV3 with the corresponding model
|
||||
- **AND** the frontend SHALL NOT send `ocr_preset`, `ocr_config`, or `table_detection` parameters
|
||||
|
||||
#### Scenario: Default model when not specified
|
||||
- **GIVEN** an API request without `layout_model` parameter
|
||||
- **WHEN** the task is started
|
||||
- **THEN** the system SHALL use "chinese" (PP-DocLayout-S) as the default model
|
||||
- **AND** processing SHALL work correctly without requiring model selection
|
||||
|
||||
#### Scenario: Invalid model name is rejected
|
||||
- **GIVEN** a request with an invalid `layout_model` value
|
||||
- **WHEN** the user sends `layout_model: "invalid_model"`
|
||||
- **THEN** the API SHALL return 422 Validation Error
|
||||
- **AND** provide a clear error message listing valid model options
|
||||
|
||||
### Requirement: Layout Model Selection UI
|
||||
The frontend SHALL provide a simple, user-friendly interface for selecting layout detection models with clear descriptions of each option.
|
||||
|
||||
#### Scenario: Model options are displayed with descriptions
|
||||
- **GIVEN** the model selection UI is displayed
|
||||
- **WHEN** the user views the available options
|
||||
- **THEN** the UI SHALL show the following options:
|
||||
- "Chinese Document Model (Recommended)" - for Chinese forms, contracts, invoices
|
||||
- "Standard Model" - for English academic papers, reports
|
||||
- "CDLA Model" - for specialized Chinese layout analysis
|
||||
- **AND** each option SHALL have a brief description of its use case
|
||||
|
||||
#### Scenario: Chinese model is selected by default
|
||||
- **GIVEN** the user opens the task processing interface
|
||||
- **WHEN** the model selection is displayed
|
||||
- **THEN** "Chinese Document Model" SHALL be pre-selected as the default
|
||||
- **AND** the user MAY change the selection before starting processing
|
||||
|
||||
#### Scenario: Model selection is visible only for OCR track
|
||||
- **GIVEN** a document processing interface
|
||||
- **WHEN** the user selects processing track
|
||||
- **THEN** layout model selection SHALL be shown ONLY when OCR track is selected or auto-detected
|
||||
- **AND** SHALL be hidden for Direct track (which does not use PP-StructureV3)
|
||||
|
||||
#### Scenario: Simplified configuration options
|
||||
- **GIVEN** the OCR track processing interface
|
||||
- **WHEN** the user configures processing options
|
||||
- **THEN** the UI SHALL only show:
|
||||
- Layout model selection (chinese/default/cdla)
|
||||
- Image preprocessing settings (auto/manual/disabled)
|
||||
- **AND** SHALL NOT show:
|
||||
- OCR preset selection
|
||||
- Table detection configuration
|
||||
- Advanced OCR parameters
|
||||
|
||||
### Requirement: Simplified Processing Options API
|
||||
The backend API SHALL accept a simplified `ProcessingOptions` schema without complex OCR configuration parameters.
|
||||
|
||||
#### Scenario: API accepts minimal configuration
|
||||
- **GIVEN** a start task API request
|
||||
- **WHEN** the request body contains:
|
||||
```json
|
||||
{
|
||||
"use_dual_track": true,
|
||||
"force_track": "ocr",
|
||||
"language": "ch",
|
||||
"layout_model": "chinese",
|
||||
"preprocessing_mode": "auto"
|
||||
}
|
||||
```
|
||||
- **THEN** the API SHALL accept the request
|
||||
- **AND** process the task using backend default values for all other parameters
|
||||
|
||||
#### Scenario: Legacy parameters are ignored
|
||||
- **GIVEN** a start task API request with legacy parameters
|
||||
- **WHEN** the request contains `ocr_preset`, `ocr_config`, or `table_detection`
|
||||
- **THEN** the API SHALL ignore these parameters
|
||||
- **AND** use backend default values instead
|
||||
- **AND** NOT return an error (backward compatibility)
|
||||
51
openspec/changes/simplify-frontend-ocr-config/tasks.md
Normal file
51
openspec/changes/simplify-frontend-ocr-config/tasks.md
Normal file
@@ -0,0 +1,51 @@
|
||||
# Tasks: 簡化前端 OCR 配置選項
|
||||
|
||||
## 1. 前端清理
|
||||
|
||||
### 1.1 移除未使用的組件
|
||||
- [x] 1.1.1 刪除 `frontend/src/components/OCRPresetSelector.tsx`
|
||||
- [x] 1.1.2 刪除 `frontend/src/components/TableDetectionSelector.tsx`
|
||||
|
||||
### 1.2 清理 TypeScript 類型定義
|
||||
- [x] 1.2.1 從 `frontend/src/types/apiV2.ts` 移除以下類型:
|
||||
- `TableDetectionConfig` (第 121-125 行)
|
||||
- `OCRPreset` (第 131 行)
|
||||
- `TableParsingMode` (第 140 行)
|
||||
- `OCRConfig` (第 146-166 行)
|
||||
- `OCRPresetInfo` (第 171-177 行)
|
||||
- [x] 1.2.2 從 `ProcessingOptions` interface 移除以下欄位:
|
||||
- `table_detection`
|
||||
- `ocr_preset`
|
||||
- `ocr_config`
|
||||
|
||||
### 1.3 清理 ProcessingPage
|
||||
- [x] 1.3.1 確認 `frontend/src/pages/ProcessingPage.tsx` 中沒有引用已移除的類型或組件
|
||||
- [x] 1.3.2 移除相關的註解說明(如果有)- 保留說明性註解
|
||||
|
||||
## 2. 後端清理
|
||||
|
||||
### 2.1 清理 Schema 定義
|
||||
- [x] 2.1.1 從 `backend/app/schemas/task.py` 移除未使用的 Enum 和 Model:
|
||||
- `TableDetectionConfig`
|
||||
- `OCRPresetEnum`
|
||||
- `TableParsingModeEnum`
|
||||
- `OCRConfig`
|
||||
- `OCR_PRESET_CONFIGS`
|
||||
- [x] 2.1.2 從 `ProcessingOptions` 移除以下欄位:
|
||||
- `table_detection`
|
||||
- `ocr_preset`
|
||||
- `ocr_config`
|
||||
|
||||
### 2.2 清理 API 端點邏輯
|
||||
- [x] 2.2.1 檢查 `backend/app/routers/tasks.py` 中的 `start_task` 端點,移除對已刪除欄位的處理
|
||||
- [x] 2.2.2 更新 `process_task_ocr` 函數簽名和呼叫
|
||||
|
||||
### 2.3 清理 Service 層
|
||||
- [x] 2.3.1 檢查 `backend/app/services/ocr_service.py`,確認沒有依賴已移除的配置項
|
||||
- 注意:ocr_service.py 保留這些參數作為可選項,使用預設值處理。這是正確的設計,保持後端彈性。
|
||||
|
||||
## 3. 驗證
|
||||
|
||||
- [x] 3.1 確認 TypeScript 編譯無新錯誤(與本次變更相關的錯誤)
|
||||
- [ ] 3.2 確認後端 API 仍正常運作(需手動測試)
|
||||
- [ ] 3.3 測試上傳 -> 處理 -> 結果查看的完整流程(需手動測試)
|
||||
Reference in New Issue
Block a user