# ocr-processing Specification Delta ## REMOVED Requirements ### Requirement: OCR Preset Selection **Reason**: OCR track 已改為 simple OCR 模式,不再需要前端提供複雜的預設配置。後端統一使用預設參數處理。 **Migration**: 移除前端 `OCRPresetSelector` 組件及相關類型定義。後端自動使用最佳預設配置。 ### Requirement: Table Detection Configuration **Reason**: 表格偵測設定(有框線/無框線表格開關、區域偵測開關)不再需要由前端控制。後端統一使用預設的表格偵測策略。 **Migration**: 移除前端 `TableDetectionSelector` 組件及 `TableDetectionConfig` 類型。後端使用內建預設值。 ### Requirement: OCR Advanced Parameters **Reason**: 進階 OCR 參數(如 `table_parsing_mode`, `layout_threshold`, `enable_chart_recognition` 等)不再需要前端配置。 **Migration**: 移除前端 `OCRConfig` 類型及相關 UI。後端固定使用 simple OCR 模式的預設參數。 ## MODIFIED Requirements ### Requirement: Layout Model Selection The system SHALL allow users to select a layout detection model optimized for their document type, providing a simple choice between pre-configured models instead of manual parameter tuning. #### Scenario: User selects Chinese document model - **GIVEN** a user is processing Chinese business documents (forms, contracts, invoices) - **WHEN** the user selects "Chinese Document Model" (PP-DocLayout-S) - **THEN** the OCR engine SHALL use the PP-DocLayout-S layout detection model - **AND** the model SHALL be optimized for 23 Chinese document element types - **AND** table and form detection accuracy SHALL be improved over the default model #### Scenario: User selects standard model for English documents - **GIVEN** a user is processing English academic papers or reports - **WHEN** the user selects "Standard Model" (PubLayNet-based) - **THEN** the OCR engine SHALL use the default PubLayNet-based layout detection model - **AND** the model SHALL be optimized for English document layouts #### Scenario: User selects CDLA model for specialized Chinese layout - **GIVEN** a user is processing Chinese documents with complex layouts - **WHEN** the user selects "CDLA Model" - **THEN** the OCR engine SHALL use the picodet_lcnet_x1_0_fgd_layout_cdla model - **AND** the model SHALL provide specialized Chinese document layout analysis #### Scenario: Layout model is sent via API request - **GIVEN** a frontend application with model selection UI - **WHEN** the user starts task processing with a selected model - **THEN** the frontend SHALL send the model choice in the request body: ```json POST /api/v2/tasks/{task_id}/start { "use_dual_track": true, "force_track": "ocr", "language": "ch", "layout_model": "chinese" } ``` - **AND** the backend SHALL configure PP-StructureV3 with the corresponding model - **AND** the frontend SHALL NOT send `ocr_preset`, `ocr_config`, or `table_detection` parameters #### Scenario: Default model when not specified - **GIVEN** an API request without `layout_model` parameter - **WHEN** the task is started - **THEN** the system SHALL use "chinese" (PP-DocLayout-S) as the default model - **AND** processing SHALL work correctly without requiring model selection #### Scenario: Invalid model name is rejected - **GIVEN** a request with an invalid `layout_model` value - **WHEN** the user sends `layout_model: "invalid_model"` - **THEN** the API SHALL return 422 Validation Error - **AND** provide a clear error message listing valid model options ### Requirement: Layout Model Selection UI The frontend SHALL provide a simple, user-friendly interface for selecting layout detection models with clear descriptions of each option. #### Scenario: Model options are displayed with descriptions - **GIVEN** the model selection UI is displayed - **WHEN** the user views the available options - **THEN** the UI SHALL show the following options: - "Chinese Document Model (Recommended)" - for Chinese forms, contracts, invoices - "Standard Model" - for English academic papers, reports - "CDLA Model" - for specialized Chinese layout analysis - **AND** each option SHALL have a brief description of its use case #### Scenario: Chinese model is selected by default - **GIVEN** the user opens the task processing interface - **WHEN** the model selection is displayed - **THEN** "Chinese Document Model" SHALL be pre-selected as the default - **AND** the user MAY change the selection before starting processing #### Scenario: Model selection is visible only for OCR track - **GIVEN** a document processing interface - **WHEN** the user selects processing track - **THEN** layout model selection SHALL be shown ONLY when OCR track is selected or auto-detected - **AND** SHALL be hidden for Direct track (which does not use PP-StructureV3) #### Scenario: Simplified configuration options - **GIVEN** the OCR track processing interface - **WHEN** the user configures processing options - **THEN** the UI SHALL only show: - Layout model selection (chinese/default/cdla) - Image preprocessing settings (auto/manual/disabled) - **AND** SHALL NOT show: - OCR preset selection - Table detection configuration - Advanced OCR parameters ### Requirement: Simplified Processing Options API The backend API SHALL accept a simplified `ProcessingOptions` schema without complex OCR configuration parameters. #### Scenario: API accepts minimal configuration - **GIVEN** a start task API request - **WHEN** the request body contains: ```json { "use_dual_track": true, "force_track": "ocr", "language": "ch", "layout_model": "chinese", "preprocessing_mode": "auto" } ``` - **THEN** the API SHALL accept the request - **AND** process the task using backend default values for all other parameters #### Scenario: Legacy parameters are ignored - **GIVEN** a start task API request with legacy parameters - **WHEN** the request contains `ocr_preset`, `ocr_config`, or `table_detection` - **THEN** the API SHALL ignore these parameters - **AND** use backend default values instead - **AND** NOT return an error (backward compatibility)