feat: simplify layout model selection and archive proposals

Changes:
- Replace PP-Structure 7-slider parameter UI with simple 3-option layout model selector
- Add layout model mapping: chinese (PP-DocLayout-S), default (PubLayNet), cdla
- Add LayoutModelSelector component and zh-TW translations
- Fix "default" model behavior with sentinel value for PubLayNet
- Add gap filling service for OCR track coverage improvement
- Add PP-Structure debug utilities
- Archive completed/incomplete proposals:
  - add-ocr-track-gap-filling (complete)
  - fix-ocr-track-table-rendering (incomplete)
  - simplify-ppstructure-model-selection (22/25 tasks)
- Add new layout model tests, archive old PP-Structure param tests
- Update OpenSpec ocr-processing spec with layout model requirements

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
egg
2025-11-27 13:27:00 +08:00
parent c65df754cf
commit 59206a6ab8
35 changed files with 3621 additions and 658 deletions

View File

@@ -0,0 +1,40 @@
# Change: Simplify PP-StructureV3 Configuration with Layout Model Selection
## Why
Current PP-StructureV3 parameter adjustment UI exposes 7 technical ML parameters (thresholds, ratios, merge modes) that are difficult for end users to understand. Meanwhile, switching to a different layout detection model (e.g., CDLA-trained models for Chinese documents) would have a much greater impact on OCR quality than fine-tuning these parameters.
**Problems with current approach:**
- Users don't understand what `layout_detection_threshold` or `text_det_unclip_ratio` mean
- Wrong parameter values can make OCR results worse
- The default model (PubLayNet-based) is optimized for English academic papers, not Chinese business documents
- Model selection is far more impactful than parameter tuning
## What Changes
### Backend Changes
- **REMOVED**: API parameter `pp_structure_params` from task start endpoint
- **ADDED**: New API parameter `layout_model` with predefined options:
- `"default"` - Standard model (PubLayNet-based, for English documents)
- `"chinese"` - PP-DocLayout-S model (for Chinese documents, forms, contracts)
- `"cdla"` - CDLA model (alternative Chinese document layout model)
- **MODIFIED**: PP-StructureV3 initialization uses `layout_detection_model_name` based on selection
- Keep fine-tuning parameters in backend `config.py` with optimized defaults
### Frontend Changes
- **REMOVED**: `PPStructureParams.tsx` component (slider/dropdown UI for 7 parameters)
- **ADDED**: Simple radio button/dropdown for layout model selection with clear descriptions
- **MODIFIED**: Task start request body to send `layout_model` instead of `pp_structure_params`
### API Changes
- **BREAKING**: Remove `pp_structure_params` from `POST /api/v2/tasks/{task_id}/start`
- **ADDED**: New optional parameter `layout_model: "default" | "chinese" | "cdla"`
## Impact
- Affected specs: `ocr-processing`
- Affected code:
- Backend: `app/routers/tasks.py`, `app/services/ocr_service.py`, `app/core/config.py`
- Frontend: `src/components/PPStructureParams.tsx` (remove), `src/types/apiV2.ts`, task start form
- Breaking change: Clients using `pp_structure_params` will need to migrate to `layout_model`
- User impact: Simpler UI, better default OCR quality for Chinese documents