feat: simplify layout model selection and archive proposals
Changes: - Replace PP-Structure 7-slider parameter UI with simple 3-option layout model selector - Add layout model mapping: chinese (PP-DocLayout-S), default (PubLayNet), cdla - Add LayoutModelSelector component and zh-TW translations - Fix "default" model behavior with sentinel value for PubLayNet - Add gap filling service for OCR track coverage improvement - Add PP-Structure debug utilities - Archive completed/incomplete proposals: - add-ocr-track-gap-filling (complete) - fix-ocr-track-table-rendering (incomplete) - simplify-ppstructure-model-selection (22/25 tasks) - Add new layout model tests, archive old PP-Structure param tests - Update OpenSpec ocr-processing spec with layout model requirements 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,40 @@
|
||||
# Change: Simplify PP-StructureV3 Configuration with Layout Model Selection
|
||||
|
||||
## Why
|
||||
|
||||
Current PP-StructureV3 parameter adjustment UI exposes 7 technical ML parameters (thresholds, ratios, merge modes) that are difficult for end users to understand. Meanwhile, switching to a different layout detection model (e.g., CDLA-trained models for Chinese documents) would have a much greater impact on OCR quality than fine-tuning these parameters.
|
||||
|
||||
**Problems with current approach:**
|
||||
- Users don't understand what `layout_detection_threshold` or `text_det_unclip_ratio` mean
|
||||
- Wrong parameter values can make OCR results worse
|
||||
- The default model (PubLayNet-based) is optimized for English academic papers, not Chinese business documents
|
||||
- Model selection is far more impactful than parameter tuning
|
||||
|
||||
## What Changes
|
||||
|
||||
### Backend Changes
|
||||
- **REMOVED**: API parameter `pp_structure_params` from task start endpoint
|
||||
- **ADDED**: New API parameter `layout_model` with predefined options:
|
||||
- `"default"` - Standard model (PubLayNet-based, for English documents)
|
||||
- `"chinese"` - PP-DocLayout-S model (for Chinese documents, forms, contracts)
|
||||
- `"cdla"` - CDLA model (alternative Chinese document layout model)
|
||||
- **MODIFIED**: PP-StructureV3 initialization uses `layout_detection_model_name` based on selection
|
||||
- Keep fine-tuning parameters in backend `config.py` with optimized defaults
|
||||
|
||||
### Frontend Changes
|
||||
- **REMOVED**: `PPStructureParams.tsx` component (slider/dropdown UI for 7 parameters)
|
||||
- **ADDED**: Simple radio button/dropdown for layout model selection with clear descriptions
|
||||
- **MODIFIED**: Task start request body to send `layout_model` instead of `pp_structure_params`
|
||||
|
||||
### API Changes
|
||||
- **BREAKING**: Remove `pp_structure_params` from `POST /api/v2/tasks/{task_id}/start`
|
||||
- **ADDED**: New optional parameter `layout_model: "default" | "chinese" | "cdla"`
|
||||
|
||||
## Impact
|
||||
|
||||
- Affected specs: `ocr-processing`
|
||||
- Affected code:
|
||||
- Backend: `app/routers/tasks.py`, `app/services/ocr_service.py`, `app/core/config.py`
|
||||
- Frontend: `src/components/PPStructureParams.tsx` (remove), `src/types/apiV2.ts`, task start form
|
||||
- Breaking change: Clients using `pp_structure_params` will need to migrate to `layout_model`
|
||||
- User impact: Simpler UI, better default OCR quality for Chinese documents
|
||||
Reference in New Issue
Block a user