Changes: - Replace PP-Structure 7-slider parameter UI with simple 3-option layout model selector - Add layout model mapping: chinese (PP-DocLayout-S), default (PubLayNet), cdla - Add LayoutModelSelector component and zh-TW translations - Fix "default" model behavior with sentinel value for PubLayNet - Add gap filling service for OCR track coverage improvement - Add PP-Structure debug utilities - Archive completed/incomplete proposals: - add-ocr-track-gap-filling (complete) - fix-ocr-track-table-rendering (incomplete) - simplify-ppstructure-model-selection (22/25 tasks) - Add new layout model tests, archive old PP-Structure param tests - Update OpenSpec ocr-processing spec with layout model requirements 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
87 lines
4.4 KiB
Markdown
87 lines
4.4 KiB
Markdown
# ocr-processing Specification Delta
|
|
|
|
## REMOVED Requirements
|
|
|
|
### Requirement: Frontend-Adjustable PP-StructureV3 Parameters
|
|
**Reason**: Complex ML parameters are difficult for end users to understand and tune. Model selection provides better UX and more significant quality improvements.
|
|
**Migration**: Replace `pp_structure_params` API parameter with `layout_model` parameter.
|
|
|
|
### Requirement: PP-StructureV3 Parameter UI Controls
|
|
**Reason**: Slider/dropdown UI for 7 technical parameters adds complexity without proportional benefit. Simple model selection is more user-friendly.
|
|
**Migration**: Remove `PPStructureParams.tsx` component, add `LayoutModelSelector.tsx` component.
|
|
|
|
## ADDED Requirements
|
|
|
|
### Requirement: Layout Model Selection
|
|
The system SHALL allow users to select a layout detection model optimized for their document type, providing a simple choice between pre-configured models instead of manual parameter tuning.
|
|
|
|
#### Scenario: User selects Chinese document model
|
|
- **GIVEN** a user is processing Chinese business documents (forms, contracts, invoices)
|
|
- **WHEN** the user selects "Chinese Document Model" (PP-DocLayout-S)
|
|
- **THEN** the OCR engine SHALL use the PP-DocLayout-S layout detection model
|
|
- **AND** the model SHALL be optimized for 23 Chinese document element types
|
|
- **AND** table and form detection accuracy SHALL be improved over the default model
|
|
|
|
#### Scenario: User selects standard model for English documents
|
|
- **GIVEN** a user is processing English academic papers or reports
|
|
- **WHEN** the user selects "Standard Model" (PubLayNet-based)
|
|
- **THEN** the OCR engine SHALL use the default PubLayNet-based layout detection model
|
|
- **AND** the model SHALL be optimized for English document layouts
|
|
|
|
#### Scenario: User selects CDLA model for specialized Chinese layout
|
|
- **GIVEN** a user is processing Chinese documents with complex layouts
|
|
- **WHEN** the user selects "CDLA Model"
|
|
- **THEN** the OCR engine SHALL use the picodet_lcnet_x1_0_fgd_layout_cdla model
|
|
- **AND** the model SHALL provide specialized Chinese document layout analysis
|
|
|
|
#### Scenario: Layout model is sent via API request
|
|
- **GIVEN** a frontend application with model selection UI
|
|
- **WHEN** the user starts task processing with a selected model
|
|
- **THEN** the frontend SHALL send the model choice in the request body:
|
|
```json
|
|
POST /api/v2/tasks/{task_id}/start
|
|
{
|
|
"use_dual_track": true,
|
|
"force_track": "ocr",
|
|
"language": "ch",
|
|
"layout_model": "chinese"
|
|
}
|
|
```
|
|
- **AND** the backend SHALL configure PP-StructureV3 with the corresponding model
|
|
|
|
#### Scenario: Default model when not specified
|
|
- **GIVEN** an API request without `layout_model` parameter
|
|
- **WHEN** the task is started
|
|
- **THEN** the system SHALL use "chinese" (PP-DocLayout-S) as the default model
|
|
- **AND** processing SHALL work correctly without requiring model selection
|
|
|
|
#### Scenario: Invalid model name is rejected
|
|
- **GIVEN** a request with an invalid `layout_model` value
|
|
- **WHEN** the user sends `layout_model: "invalid_model"`
|
|
- **THEN** the API SHALL return 422 Validation Error
|
|
- **AND** provide a clear error message listing valid model options
|
|
|
|
### Requirement: Layout Model Selection UI
|
|
The frontend SHALL provide a simple, user-friendly interface for selecting layout detection models with clear descriptions of each option.
|
|
|
|
#### Scenario: Model options are displayed with descriptions
|
|
- **GIVEN** the model selection UI is displayed
|
|
- **WHEN** the user views the available options
|
|
- **THEN** the UI SHALL show the following options:
|
|
- "Chinese Document Model (Recommended)" - for Chinese forms, contracts, invoices
|
|
- "Standard Model" - for English academic papers, reports
|
|
- "CDLA Model" - for specialized Chinese layout analysis
|
|
- **AND** each option SHALL have a brief description of its use case
|
|
|
|
#### Scenario: Chinese model is selected by default
|
|
- **GIVEN** the user opens the task processing interface
|
|
- **WHEN** the model selection is displayed
|
|
- **THEN** "Chinese Document Model" SHALL be pre-selected as the default
|
|
- **AND** the user MAY change the selection before starting processing
|
|
|
|
#### Scenario: Model selection is visible only for OCR track
|
|
- **GIVEN** a document processing interface
|
|
- **WHEN** the user selects processing track
|
|
- **THEN** layout model selection SHALL be shown ONLY when OCR track is selected or auto-detected
|
|
- **AND** SHALL be hidden for Direct track (which does not use PP-StructureV3)
|