OCR/openspec/specs/ocr-processing/spec.md

# ocr-processing Specification

## Purpose
TBD - created by archiving change frontend-adjustable-ppstructure-params. Update Purpose after archive.
## Requirements
### Requirement: Frontend-Adjustable PP-StructureV3 Parameters
The system SHALL allow frontend users to dynamically adjust PP-StructureV3 OCR parameters for fine-tuning document processing without backend configuration changes.

#### Scenario: User adjusts layout detection threshold
- **GIVEN** a user is processing a document with OCR track
- **WHEN** the user sets `layout_detection_threshold` to 0.1 (lower than default 0.2)
- **THEN** the OCR engine SHALL detect more layout blocks including weak signals
- **AND** the processing SHALL use the custom parameter instead of backend defaults
- **AND** the custom parameter SHALL NOT be cached for reuse

#### Scenario: User selects high-quality preset configuration
- **GIVEN** a user wants to process a complex document with many small text elements
- **WHEN** the user selects "High Quality" preset mode
- **THEN** the system SHALL automatically set:
  - `layout_detection_threshold` to 0.1
  - `layout_nms_threshold` to 0.15
  - `text_det_thresh` to 0.1
  - `text_det_box_thresh` to 0.2
- **AND** process the document with these optimized parameters

#### Scenario: User adjusts text detection parameters
- **GIVEN** a document with low-contrast text
- **WHEN** the user sets:
  - `text_det_thresh` to 0.05 (very low)
  - `text_det_unclip_ratio` to 1.5 (larger boxes)
- **THEN** the OCR SHALL detect more small and low-contrast text
- **AND** text bounding boxes SHALL be expanded by the specified ratio

#### Scenario: Parameters are sent via API request body
- **GIVEN** a frontend application with parameter adjustment UI
- **WHEN** the user starts task processing with custom parameters
- **THEN** the frontend SHALL send parameters in the request body (not query params):
  ```json
  POST /api/v2/tasks/{task_id}/start
  {
    "use_dual_track": true,
    "force_track": "ocr",
    "language": "ch",
    "pp_structure_params": {
      "layout_detection_threshold": 0.15,
      "layout_merge_bboxes_mode": "small",
      "text_det_thresh": 0.1
    }
  }
  ```
- **AND** the backend SHALL parse and apply these parameters

#### Scenario: Backward compatibility is maintained
- **GIVEN** existing API clients without PP-StructureV3 parameter support
- **WHEN** a task is started without `pp_structure_params`
- **THEN** the system SHALL use backend default settings
- **AND** processing SHALL work exactly as before
- **AND** no errors SHALL occur

#### Scenario: Invalid parameters are rejected
- **GIVEN** a request with invalid parameter values
- **WHEN** the user sends:
  - `layout_detection_threshold` = 1.5 (exceeds max 1.0)
  - `layout_merge_bboxes_mode` = "invalid" (not in allowed values)
- **THEN** the API SHALL return 422 Validation Error
- **AND** provide clear error messages about invalid parameters

#### Scenario: Custom parameters affect only current processing
- **GIVEN** multiple concurrent OCR processing tasks
- **WHEN** Task A uses custom parameters and Task B uses defaults
- **THEN** Task A SHALL process with its custom parameters
- **AND** Task B SHALL process with default parameters
- **AND** no parameter interference SHALL occur between tasks

### Requirement: PP-StructureV3 Parameter UI Controls
The frontend SHALL provide intuitive UI controls for adjusting PP-StructureV3 parameters with appropriate constraints and help text.

#### Scenario: Slider controls for numeric parameters
- **GIVEN** the parameter adjustment UI is displayed
- **WHEN** the user adjusts a numeric parameter slider
- **THEN** the slider SHALL enforce min/max constraints:
  - Threshold parameters: 0.0 to 1.0
  - Ratio parameters: > 0 (typically 0.5 to 3.0)
- **AND** display current value in real-time
- **AND** show help text explaining the parameter effect

#### Scenario: Dropdown for merge mode selection
- **GIVEN** the layout merge mode parameter
- **WHEN** the user clicks the dropdown
- **THEN** the UI SHALL show exactly three options:
  - "small" (conservative merging)
  - "large" (aggressive merging)
  - "union" (middle ground)
- **AND** display description for each option

#### Scenario: Parameters shown only for OCR track
- **GIVEN** a document processing interface
- **WHEN** the user selects processing track
- **THEN** PP-StructureV3 parameters SHALL be shown ONLY when OCR track is selected
- **AND** SHALL be hidden for Direct track
- **AND** SHALL be disabled for Auto track until track is determined