feat: simplify layout model selection and archive proposals

Changes: - Replace PP-Structure 7-slider parameter UI with simple 3-option layout model selector - Add layout model mapping: chinese (PP-DocLayout-S), default (PubLayNet), cdla - Add LayoutModelSelector component and zh-TW translations - Fix "default" model behavior with sentinel value for PubLayNet - Add gap filling service for OCR track coverage improvement - Add PP-Structure debug utilities - Archive completed/incomplete proposals: - add-ocr-track-gap-filling (complete) - fix-ocr-track-table-rendering (incomplete) - simplify-ppstructure-model-selection (22/25 tasks) - Add new layout model tests, archive old PP-Structure param tests - Update OpenSpec ocr-processing spec with layout model requirements 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-27 13:27:00 +08:00
parent c65df754cf
commit 59206a6ab8
35 changed files with 3621 additions and 658 deletions
--- a/openspec/changes/archive/2025-11-27-simplify-ppstructure-model-selection/proposal.md
+++ b/openspec/changes/archive/2025-11-27-simplify-ppstructure-model-selection/proposal.md
@@ -0,0 +1,40 @@
+# Change: Simplify PP-StructureV3 Configuration with Layout Model Selection
+
+## Why
+
+Current PP-StructureV3 parameter adjustment UI exposes 7 technical ML parameters (thresholds, ratios, merge modes) that are difficult for end users to understand. Meanwhile, switching to a different layout detection model (e.g., CDLA-trained models for Chinese documents) would have a much greater impact on OCR quality than fine-tuning these parameters.
+
+**Problems with current approach:**
+- Users don't understand what `layout_detection_threshold` or `text_det_unclip_ratio` mean
+- Wrong parameter values can make OCR results worse
+- The default model (PubLayNet-based) is optimized for English academic papers, not Chinese business documents
+- Model selection is far more impactful than parameter tuning
+
+## What Changes
+
+### Backend Changes
+- **REMOVED**: API parameter `pp_structure_params` from task start endpoint
+- **ADDED**: New API parameter `layout_model` with predefined options:
+  - `"default"` - Standard model (PubLayNet-based, for English documents)
+  - `"chinese"` - PP-DocLayout-S model (for Chinese documents, forms, contracts)
+  - `"cdla"` - CDLA model (alternative Chinese document layout model)
+- **MODIFIED**: PP-StructureV3 initialization uses `layout_detection_model_name` based on selection
+- Keep fine-tuning parameters in backend `config.py` with optimized defaults
+
+### Frontend Changes
+- **REMOVED**: `PPStructureParams.tsx` component (slider/dropdown UI for 7 parameters)
+- **ADDED**: Simple radio button/dropdown for layout model selection with clear descriptions
+- **MODIFIED**: Task start request body to send `layout_model` instead of `pp_structure_params`
+
+### API Changes
+- **BREAKING**: Remove `pp_structure_params` from `POST /api/v2/tasks/{task_id}/start`
+- **ADDED**: New optional parameter `layout_model: "default" | "chinese" | "cdla"`
+
+## Impact
+
+- Affected specs: `ocr-processing`
+- Affected code:
+  - Backend: `app/routers/tasks.py`, `app/services/ocr_service.py`, `app/core/config.py`
+  - Frontend: `src/components/PPStructureParams.tsx` (remove), `src/types/apiV2.ts`, task start form
+- Breaking change: Clients using `pp_structure_params` will need to migrate to `layout_model`
+- User impact: Simpler UI, better default OCR quality for Chinese documents
--- a/openspec/changes/archive/2025-11-27-simplify-ppstructure-model-selection/specs/ocr-processing/spec.md
+++ b/openspec/changes/archive/2025-11-27-simplify-ppstructure-model-selection/specs/ocr-processing/spec.md
@@ -0,0 +1,86 @@
+# ocr-processing Specification Delta
+
+## REMOVED Requirements
+
+### Requirement: Frontend-Adjustable PP-StructureV3 Parameters
+**Reason**: Complex ML parameters are difficult for end users to understand and tune. Model selection provides better UX and more significant quality improvements.
+**Migration**: Replace `pp_structure_params` API parameter with `layout_model` parameter.
+
+### Requirement: PP-StructureV3 Parameter UI Controls
+**Reason**: Slider/dropdown UI for 7 technical parameters adds complexity without proportional benefit. Simple model selection is more user-friendly.
+**Migration**: Remove `PPStructureParams.tsx` component, add `LayoutModelSelector.tsx` component.
+
+## ADDED Requirements
+
+### Requirement: Layout Model Selection
+The system SHALL allow users to select a layout detection model optimized for their document type, providing a simple choice between pre-configured models instead of manual parameter tuning.
+
+#### Scenario: User selects Chinese document model
+- **GIVEN** a user is processing Chinese business documents (forms, contracts, invoices)
+- **WHEN** the user selects "Chinese Document Model" (PP-DocLayout-S)
+- **THEN** the OCR engine SHALL use the PP-DocLayout-S layout detection model
+- **AND** the model SHALL be optimized for 23 Chinese document element types
+- **AND** table and form detection accuracy SHALL be improved over the default model
+
+#### Scenario: User selects standard model for English documents
+- **GIVEN** a user is processing English academic papers or reports
+- **WHEN** the user selects "Standard Model" (PubLayNet-based)
+- **THEN** the OCR engine SHALL use the default PubLayNet-based layout detection model
+- **AND** the model SHALL be optimized for English document layouts
+
+#### Scenario: User selects CDLA model for specialized Chinese layout
+- **GIVEN** a user is processing Chinese documents with complex layouts
+- **WHEN** the user selects "CDLA Model"
+- **THEN** the OCR engine SHALL use the picodet_lcnet_x1_0_fgd_layout_cdla model
+- **AND** the model SHALL provide specialized Chinese document layout analysis
+
+#### Scenario: Layout model is sent via API request
+- **GIVEN** a frontend application with model selection UI
+- **WHEN** the user starts task processing with a selected model
+- **THEN** the frontend SHALL send the model choice in the request body:
+  ```json
+  POST /api/v2/tasks/{task_id}/start
+  {
+    "use_dual_track": true,
+    "force_track": "ocr",
+    "language": "ch",
+    "layout_model": "chinese"
+  }
+  ```
+- **AND** the backend SHALL configure PP-StructureV3 with the corresponding model
+
+#### Scenario: Default model when not specified
+- **GIVEN** an API request without `layout_model` parameter
+- **WHEN** the task is started
+- **THEN** the system SHALL use "chinese" (PP-DocLayout-S) as the default model
+- **AND** processing SHALL work correctly without requiring model selection
+
+#### Scenario: Invalid model name is rejected
+- **GIVEN** a request with an invalid `layout_model` value
+- **WHEN** the user sends `layout_model: "invalid_model"`
+- **THEN** the API SHALL return 422 Validation Error
+- **AND** provide a clear error message listing valid model options
+
+### Requirement: Layout Model Selection UI
+The frontend SHALL provide a simple, user-friendly interface for selecting layout detection models with clear descriptions of each option.
+
+#### Scenario: Model options are displayed with descriptions
+- **GIVEN** the model selection UI is displayed
+- **WHEN** the user views the available options
+- **THEN** the UI SHALL show the following options:
+  - "Chinese Document Model (Recommended)" - for Chinese forms, contracts, invoices
+  - "Standard Model" - for English academic papers, reports
+  - "CDLA Model" - for specialized Chinese layout analysis
+- **AND** each option SHALL have a brief description of its use case
+
+#### Scenario: Chinese model is selected by default
+- **GIVEN** the user opens the task processing interface
+- **WHEN** the model selection is displayed
+- **THEN** "Chinese Document Model" SHALL be pre-selected as the default
+- **AND** the user MAY change the selection before starting processing
+
+#### Scenario: Model selection is visible only for OCR track
+- **GIVEN** a document processing interface
+- **WHEN** the user selects processing track
+- **THEN** layout model selection SHALL be shown ONLY when OCR track is selected or auto-detected
+- **AND** SHALL be hidden for Direct track (which does not use PP-StructureV3)
--- a/openspec/changes/archive/2025-11-27-simplify-ppstructure-model-selection/tasks.md
+++ b/openspec/changes/archive/2025-11-27-simplify-ppstructure-model-selection/tasks.md
@@ -0,0 +1,56 @@
+# Implementation Tasks
+
+## 1. Backend API Changes
+
+- [x] 1.1 Update `app/schemas/task.py` to add `layout_model` enum type
+- [x] 1.2 Update `app/routers/tasks.py` to replace `pp_structure_params` with `layout_model` parameter
+- [x] 1.3 Update `app/services/ocr_service.py` to map `layout_model` to `layout_detection_model_name`
+- [x] 1.4 Remove custom PP-Structure engine creation logic (use model selection instead)
+- [x] 1.5 Add backward compatibility: default to "chinese" if no model specified
+
+## 2. Backend Configuration
+
+- [x] 2.1 Keep `layout_detection_model_name` in `config.py` as fallback default
+- [x] 2.2 Keep fine-tuning parameters in `config.py` (not exposed to API)
+- [x] 2.3 Document available layout models in config comments
+
+## 3. Frontend Changes
+
+- [x] 3.1 Remove `PPStructureParams.tsx` component
+- [x] 3.2 Update `src/types/apiV2.ts`:
+  - Remove `PPStructureV3Params` interface
+  - Add `LayoutModel` type: `"default" | "chinese" | "cdla"`
+  - Update `ProcessingOptions` to use `layout_model` instead of `pp_structure_params`
+- [x] 3.3 Create `LayoutModelSelector.tsx` component with:
+  - Radio buttons or dropdown for model selection
+  - Clear descriptions for each model option
+  - Default selection: "chinese"
+- [x] 3.4 Update task start form to use new `LayoutModelSelector`
+- [x] 3.5 Update API calls to send `layout_model` instead of `pp_structure_params`
+
+## 4. Internationalization
+
+- [x] 4.1 Add i18n strings for layout model options:
+  - `layoutModel.default`: "Standard Model (English documents)"
+  - `layoutModel.chinese`: "Chinese Document Model (Recommended)"
+  - `layoutModel.cdla`: "CDLA Model (Chinese layout analysis)"
+- [x] 4.2 Add i18n strings for model descriptions
+
+## 5. Testing
+
+- [x] 5.1 Create new tests for `layout_model` parameter (`test_layout_model_api.py`, `test_layout_model.py`)
+- [x] 5.2 Archive tests for `pp_structure_params` validation (moved to `tests/archived/`)
+- [x] 5.3 Add tests for layout model selection (19 tests passing)
+- [x] 5.4 Test backward compatibility (no model specified → use chinese default)
+
+## 6. Documentation
+
+- [ ] 6.1 Update API documentation for task start endpoint
+- [ ] 6.2 Remove PP-Structure parameter documentation
+- [ ] 6.3 Add layout model selection documentation
+
+## 7. Cleanup
+
+- [x] 7.1 Remove localStorage keys for PP-Structure params (`pp_structure_params_presets`, `pp_structure_params_last_used`)
+- [x] 7.2 Remove any unused imports/types related to PP-Structure params
+- [x] 7.3 Archive old PP-Structure params test files