- Enable PP-StructureV3's use_doc_orientation_classify feature - Detect rotation angle from doc_preprocessor_res.angle - Swap page dimensions (width <-> height) for 90°/270° rotations - Output PDF now correctly displays landscape-scanned content Also includes: - Archive completed openspec proposals - Add simplify-frontend-ocr-config proposal (pending) - Code cleanup and frontend simplification 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
6.0 KiB
ocr-processing Specification Delta
REMOVED Requirements
Requirement: OCR Preset Selection
Reason: OCR track 已改為 simple OCR 模式,不再需要前端提供複雜的預設配置。後端統一使用預設參數處理。
Migration: 移除前端 OCRPresetSelector 組件及相關類型定義。後端自動使用最佳預設配置。
Requirement: Table Detection Configuration
Reason: 表格偵測設定(有框線/無框線表格開關、區域偵測開關)不再需要由前端控制。後端統一使用預設的表格偵測策略。
Migration: 移除前端 TableDetectionSelector 組件及 TableDetectionConfig 類型。後端使用內建預設值。
Requirement: OCR Advanced Parameters
Reason: 進階 OCR 參數(如 table_parsing_mode, layout_threshold, enable_chart_recognition 等)不再需要前端配置。
Migration: 移除前端 OCRConfig 類型及相關 UI。後端固定使用 simple OCR 模式的預設參數。
MODIFIED Requirements
Requirement: Layout Model Selection
The system SHALL allow users to select a layout detection model optimized for their document type, providing a simple choice between pre-configured models instead of manual parameter tuning.
Scenario: User selects Chinese document model
- GIVEN a user is processing Chinese business documents (forms, contracts, invoices)
- WHEN the user selects "Chinese Document Model" (PP-DocLayout-S)
- THEN the OCR engine SHALL use the PP-DocLayout-S layout detection model
- AND the model SHALL be optimized for 23 Chinese document element types
- AND table and form detection accuracy SHALL be improved over the default model
Scenario: User selects standard model for English documents
- GIVEN a user is processing English academic papers or reports
- WHEN the user selects "Standard Model" (PubLayNet-based)
- THEN the OCR engine SHALL use the default PubLayNet-based layout detection model
- AND the model SHALL be optimized for English document layouts
Scenario: User selects CDLA model for specialized Chinese layout
- GIVEN a user is processing Chinese documents with complex layouts
- WHEN the user selects "CDLA Model"
- THEN the OCR engine SHALL use the picodet_lcnet_x1_0_fgd_layout_cdla model
- AND the model SHALL provide specialized Chinese document layout analysis
Scenario: Layout model is sent via API request
- GIVEN a frontend application with model selection UI
- WHEN the user starts task processing with a selected model
- THEN the frontend SHALL send the model choice in the request body:
POST /api/v2/tasks/{task_id}/start { "use_dual_track": true, "force_track": "ocr", "language": "ch", "layout_model": "chinese" } - AND the backend SHALL configure PP-StructureV3 with the corresponding model
- AND the frontend SHALL NOT send
ocr_preset,ocr_config, ortable_detectionparameters
Scenario: Default model when not specified
- GIVEN an API request without
layout_modelparameter - WHEN the task is started
- THEN the system SHALL use "chinese" (PP-DocLayout-S) as the default model
- AND processing SHALL work correctly without requiring model selection
Scenario: Invalid model name is rejected
- GIVEN a request with an invalid
layout_modelvalue - WHEN the user sends
layout_model: "invalid_model" - THEN the API SHALL return 422 Validation Error
- AND provide a clear error message listing valid model options
Requirement: Layout Model Selection UI
The frontend SHALL provide a simple, user-friendly interface for selecting layout detection models with clear descriptions of each option.
Scenario: Model options are displayed with descriptions
- GIVEN the model selection UI is displayed
- WHEN the user views the available options
- THEN the UI SHALL show the following options:
- "Chinese Document Model (Recommended)" - for Chinese forms, contracts, invoices
- "Standard Model" - for English academic papers, reports
- "CDLA Model" - for specialized Chinese layout analysis
- AND each option SHALL have a brief description of its use case
Scenario: Chinese model is selected by default
- GIVEN the user opens the task processing interface
- WHEN the model selection is displayed
- THEN "Chinese Document Model" SHALL be pre-selected as the default
- AND the user MAY change the selection before starting processing
Scenario: Model selection is visible only for OCR track
- GIVEN a document processing interface
- WHEN the user selects processing track
- THEN layout model selection SHALL be shown ONLY when OCR track is selected or auto-detected
- AND SHALL be hidden for Direct track (which does not use PP-StructureV3)
Scenario: Simplified configuration options
- GIVEN the OCR track processing interface
- WHEN the user configures processing options
- THEN the UI SHALL only show:
- Layout model selection (chinese/default/cdla)
- Image preprocessing settings (auto/manual/disabled)
- AND SHALL NOT show:
- OCR preset selection
- Table detection configuration
- Advanced OCR parameters
Requirement: Simplified Processing Options API
The backend API SHALL accept a simplified ProcessingOptions schema without complex OCR configuration parameters.
Scenario: API accepts minimal configuration
- GIVEN a start task API request
- WHEN the request body contains:
{ "use_dual_track": true, "force_track": "ocr", "language": "ch", "layout_model": "chinese", "preprocessing_mode": "auto" } - THEN the API SHALL accept the request
- AND process the task using backend default values for all other parameters
Scenario: Legacy parameters are ignored
- GIVEN a start task API request with legacy parameters
- WHEN the request contains
ocr_preset,ocr_config, ortable_detection - THEN the API SHALL ignore these parameters
- AND use backend default values instead
- AND NOT return an error (backward compatibility)