feat: simplify layout model selection and archive proposals

Changes: - Replace PP-Structure 7-slider parameter UI with simple 3-option layout model selector - Add layout model mapping: chinese (PP-DocLayout-S), default (PubLayNet), cdla - Add LayoutModelSelector component and zh-TW translations - Fix "default" model behavior with sentinel value for PubLayNet - Add gap filling service for OCR track coverage improvement - Add PP-Structure debug utilities - Archive completed/incomplete proposals: - add-ocr-track-gap-filling (complete) - fix-ocr-track-table-rendering (incomplete) - simplify-ppstructure-model-selection (22/25 tasks) - Add new layout model tests, archive old PP-Structure param tests - Update OpenSpec ocr-processing spec with layout model requirements 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-27 13:27:00 +08:00
parent c65df754cf
commit 59206a6ab8
35 changed files with 3621 additions and 658 deletions
--- a/openspec/changes/archive/2025-11-27-simplify-ppstructure-model-selection/proposal.md
+++ b/openspec/changes/archive/2025-11-27-simplify-ppstructure-model-selection/proposal.md
@@ -0,0 +1,40 @@
+# Change: Simplify PP-StructureV3 Configuration with Layout Model Selection
+
+## Why
+
+Current PP-StructureV3 parameter adjustment UI exposes 7 technical ML parameters (thresholds, ratios, merge modes) that are difficult for end users to understand. Meanwhile, switching to a different layout detection model (e.g., CDLA-trained models for Chinese documents) would have a much greater impact on OCR quality than fine-tuning these parameters.
+
+**Problems with current approach:**
+- Users don't understand what `layout_detection_threshold` or `text_det_unclip_ratio` mean
+- Wrong parameter values can make OCR results worse
+- The default model (PubLayNet-based) is optimized for English academic papers, not Chinese business documents
+- Model selection is far more impactful than parameter tuning
+
+## What Changes
+
+### Backend Changes
+- **REMOVED**: API parameter `pp_structure_params` from task start endpoint
+- **ADDED**: New API parameter `layout_model` with predefined options:
+  - `"default"` - Standard model (PubLayNet-based, for English documents)
+  - `"chinese"` - PP-DocLayout-S model (for Chinese documents, forms, contracts)
+  - `"cdla"` - CDLA model (alternative Chinese document layout model)
+- **MODIFIED**: PP-StructureV3 initialization uses `layout_detection_model_name` based on selection
+- Keep fine-tuning parameters in backend `config.py` with optimized defaults
+
+### Frontend Changes
+- **REMOVED**: `PPStructureParams.tsx` component (slider/dropdown UI for 7 parameters)
+- **ADDED**: Simple radio button/dropdown for layout model selection with clear descriptions
+- **MODIFIED**: Task start request body to send `layout_model` instead of `pp_structure_params`
+
+### API Changes
+- **BREAKING**: Remove `pp_structure_params` from `POST /api/v2/tasks/{task_id}/start`
+- **ADDED**: New optional parameter `layout_model: "default" | "chinese" | "cdla"`
+
+## Impact
+
+- Affected specs: `ocr-processing`
+- Affected code:
+  - Backend: `app/routers/tasks.py`, `app/services/ocr_service.py`, `app/core/config.py`
+  - Frontend: `src/components/PPStructureParams.tsx` (remove), `src/types/apiV2.ts`, task start form
+- Breaking change: Clients using `pp_structure_params` will need to migrate to `layout_model`
+- User impact: Simpler UI, better default OCR quality for Chinese documents