# ocr-processing Specification ## Purpose TBD - created by archiving change frontend-adjustable-ppstructure-params. Update Purpose after archive. ## Requirements ### Requirement: Frontend-Adjustable PP-StructureV3 Parameters The system SHALL allow frontend users to dynamically adjust PP-StructureV3 OCR parameters for fine-tuning document processing without backend configuration changes. #### Scenario: User adjusts layout detection threshold - **GIVEN** a user is processing a document with OCR track - **WHEN** the user sets `layout_detection_threshold` to 0.1 (lower than default 0.2) - **THEN** the OCR engine SHALL detect more layout blocks including weak signals - **AND** the processing SHALL use the custom parameter instead of backend defaults - **AND** the custom parameter SHALL NOT be cached for reuse #### Scenario: User selects high-quality preset configuration - **GIVEN** a user wants to process a complex document with many small text elements - **WHEN** the user selects "High Quality" preset mode - **THEN** the system SHALL automatically set: - `layout_detection_threshold` to 0.1 - `layout_nms_threshold` to 0.15 - `text_det_thresh` to 0.1 - `text_det_box_thresh` to 0.2 - **AND** process the document with these optimized parameters #### Scenario: User adjusts text detection parameters - **GIVEN** a document with low-contrast text - **WHEN** the user sets: - `text_det_thresh` to 0.05 (very low) - `text_det_unclip_ratio` to 1.5 (larger boxes) - **THEN** the OCR SHALL detect more small and low-contrast text - **AND** text bounding boxes SHALL be expanded by the specified ratio #### Scenario: Parameters are sent via API request body - **GIVEN** a frontend application with parameter adjustment UI - **WHEN** the user starts task processing with custom parameters - **THEN** the frontend SHALL send parameters in the request body (not query params): ```json POST /api/v2/tasks/{task_id}/start { "use_dual_track": true, "force_track": "ocr", "language": "ch", "pp_structure_params": { "layout_detection_threshold": 0.15, "layout_merge_bboxes_mode": "small", "text_det_thresh": 0.1 } } ``` - **AND** the backend SHALL parse and apply these parameters #### Scenario: Backward compatibility is maintained - **GIVEN** existing API clients without PP-StructureV3 parameter support - **WHEN** a task is started without `pp_structure_params` - **THEN** the system SHALL use backend default settings - **AND** processing SHALL work exactly as before - **AND** no errors SHALL occur #### Scenario: Invalid parameters are rejected - **GIVEN** a request with invalid parameter values - **WHEN** the user sends: - `layout_detection_threshold` = 1.5 (exceeds max 1.0) - `layout_merge_bboxes_mode` = "invalid" (not in allowed values) - **THEN** the API SHALL return 422 Validation Error - **AND** provide clear error messages about invalid parameters #### Scenario: Custom parameters affect only current processing - **GIVEN** multiple concurrent OCR processing tasks - **WHEN** Task A uses custom parameters and Task B uses defaults - **THEN** Task A SHALL process with its custom parameters - **AND** Task B SHALL process with default parameters - **AND** no parameter interference SHALL occur between tasks ### Requirement: PP-StructureV3 Parameter UI Controls The frontend SHALL provide intuitive UI controls for adjusting PP-StructureV3 parameters with appropriate constraints and help text. #### Scenario: Slider controls for numeric parameters - **GIVEN** the parameter adjustment UI is displayed - **WHEN** the user adjusts a numeric parameter slider - **THEN** the slider SHALL enforce min/max constraints: - Threshold parameters: 0.0 to 1.0 - Ratio parameters: > 0 (typically 0.5 to 3.0) - **AND** display current value in real-time - **AND** show help text explaining the parameter effect #### Scenario: Dropdown for merge mode selection - **GIVEN** the layout merge mode parameter - **WHEN** the user clicks the dropdown - **THEN** the UI SHALL show exactly three options: - "small" (conservative merging) - "large" (aggressive merging) - "union" (middle ground) - **AND** display description for each option #### Scenario: Parameters shown only for OCR track - **GIVEN** a document processing interface - **WHEN** the user selects processing track - **THEN** PP-StructureV3 parameters SHALL be shown ONLY when OCR track is selected - **AND** SHALL be hidden for Direct track - **AND** SHALL be disabled for Auto track until track is determined