feat: enable document orientation detection for scanned PDFs

- Enable PP-StructureV3's use_doc_orientation_classify feature - Detect rotation angle from doc_preprocessor_res.angle - Swap page dimensions (width <-> height) for 90°/270° rotations - Output PDF now correctly displays landscape-scanned content Also includes: - Archive completed openspec proposals - Add simplify-frontend-ocr-config proposal (pending) - Code cleanup and frontend simplification 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 17:13:46 +08:00
parent 57070af307
commit cfe65158a3
58 changed files with 1271 additions and 3048 deletions
--- a/openspec/changes/simplify-frontend-ocr-config/specs/ocr-processing/spec.md
+++ b/openspec/changes/simplify-frontend-ocr-config/specs/ocr-processing/spec.md
@@ -0,0 +1,127 @@
+# ocr-processing Specification Delta
+
+## REMOVED Requirements
+
+### Requirement: OCR Preset Selection
+**Reason**: OCR track 已改為 simple OCR 模式，不再需要前端提供複雜的預設配置。後端統一使用預設參數處理。
+**Migration**: 移除前端 `OCRPresetSelector` 組件及相關類型定義。後端自動使用最佳預設配置。
+
+### Requirement: Table Detection Configuration
+**Reason**: 表格偵測設定（有框線/無框線表格開關、區域偵測開關）不再需要由前端控制。後端統一使用預設的表格偵測策略。
+**Migration**: 移除前端 `TableDetectionSelector` 組件及 `TableDetectionConfig` 類型。後端使用內建預設值。
+
+### Requirement: OCR Advanced Parameters
+**Reason**: 進階 OCR 參數（如 `table_parsing_mode`, `layout_threshold`, `enable_chart_recognition` 等）不再需要前端配置。
+**Migration**: 移除前端 `OCRConfig` 類型及相關 UI。後端固定使用 simple OCR 模式的預設參數。
+
+## MODIFIED Requirements
+
+### Requirement: Layout Model Selection
+The system SHALL allow users to select a layout detection model optimized for their document type, providing a simple choice between pre-configured models instead of manual parameter tuning.
+
+#### Scenario: User selects Chinese document model
+- **GIVEN** a user is processing Chinese business documents (forms, contracts, invoices)
+- **WHEN** the user selects "Chinese Document Model" (PP-DocLayout-S)
+- **THEN** the OCR engine SHALL use the PP-DocLayout-S layout detection model
+- **AND** the model SHALL be optimized for 23 Chinese document element types
+- **AND** table and form detection accuracy SHALL be improved over the default model
+
+#### Scenario: User selects standard model for English documents
+- **GIVEN** a user is processing English academic papers or reports
+- **WHEN** the user selects "Standard Model" (PubLayNet-based)
+- **THEN** the OCR engine SHALL use the default PubLayNet-based layout detection model
+- **AND** the model SHALL be optimized for English document layouts
+
+#### Scenario: User selects CDLA model for specialized Chinese layout
+- **GIVEN** a user is processing Chinese documents with complex layouts
+- **WHEN** the user selects "CDLA Model"
+- **THEN** the OCR engine SHALL use the picodet_lcnet_x1_0_fgd_layout_cdla model
+- **AND** the model SHALL provide specialized Chinese document layout analysis
+
+#### Scenario: Layout model is sent via API request
+- **GIVEN** a frontend application with model selection UI
+- **WHEN** the user starts task processing with a selected model
+- **THEN** the frontend SHALL send the model choice in the request body:
+  ```json
+  POST /api/v2/tasks/{task_id}/start
+  {
+    "use_dual_track": true,
+    "force_track": "ocr",
+    "language": "ch",
+    "layout_model": "chinese"
+  }
+  ```
+- **AND** the backend SHALL configure PP-StructureV3 with the corresponding model
+- **AND** the frontend SHALL NOT send `ocr_preset`, `ocr_config`, or `table_detection` parameters
+
+#### Scenario: Default model when not specified
+- **GIVEN** an API request without `layout_model` parameter
+- **WHEN** the task is started
+- **THEN** the system SHALL use "chinese" (PP-DocLayout-S) as the default model
+- **AND** processing SHALL work correctly without requiring model selection
+
+#### Scenario: Invalid model name is rejected
+- **GIVEN** a request with an invalid `layout_model` value
+- **WHEN** the user sends `layout_model: "invalid_model"`
+- **THEN** the API SHALL return 422 Validation Error
+- **AND** provide a clear error message listing valid model options
+
+### Requirement: Layout Model Selection UI
+The frontend SHALL provide a simple, user-friendly interface for selecting layout detection models with clear descriptions of each option.
+
+#### Scenario: Model options are displayed with descriptions
+- **GIVEN** the model selection UI is displayed
+- **WHEN** the user views the available options
+- **THEN** the UI SHALL show the following options:
+  - "Chinese Document Model (Recommended)" - for Chinese forms, contracts, invoices
+  - "Standard Model" - for English academic papers, reports
+  - "CDLA Model" - for specialized Chinese layout analysis
+- **AND** each option SHALL have a brief description of its use case
+
+#### Scenario: Chinese model is selected by default
+- **GIVEN** the user opens the task processing interface
+- **WHEN** the model selection is displayed
+- **THEN** "Chinese Document Model" SHALL be pre-selected as the default
+- **AND** the user MAY change the selection before starting processing
+
+#### Scenario: Model selection is visible only for OCR track
+- **GIVEN** a document processing interface
+- **WHEN** the user selects processing track
+- **THEN** layout model selection SHALL be shown ONLY when OCR track is selected or auto-detected
+- **AND** SHALL be hidden for Direct track (which does not use PP-StructureV3)
+
+#### Scenario: Simplified configuration options
+- **GIVEN** the OCR track processing interface
+- **WHEN** the user configures processing options
+- **THEN** the UI SHALL only show:
+  - Layout model selection (chinese/default/cdla)
+  - Image preprocessing settings (auto/manual/disabled)
+- **AND** SHALL NOT show:
+  - OCR preset selection
+  - Table detection configuration
+  - Advanced OCR parameters
+
+### Requirement: Simplified Processing Options API
+The backend API SHALL accept a simplified `ProcessingOptions` schema without complex OCR configuration parameters.
+
+#### Scenario: API accepts minimal configuration
+- **GIVEN** a start task API request
+- **WHEN** the request body contains:
+  ```json
+  {
+    "use_dual_track": true,
+    "force_track": "ocr",
+    "language": "ch",
+    "layout_model": "chinese",
+    "preprocessing_mode": "auto"
+  }
+  ```
+- **THEN** the API SHALL accept the request
+- **AND** process the task using backend default values for all other parameters
+
+#### Scenario: Legacy parameters are ignored
+- **GIVEN** a start task API request with legacy parameters
+- **WHEN** the request contains `ocr_preset`, `ocr_config`, or `table_detection`
+- **THEN** the API SHALL ignore these parameters
+- **AND** use backend default values instead
+- **AND** NOT return an error (backward compatibility)