Changes: - Replace PP-Structure 7-slider parameter UI with simple 3-option layout model selector - Add layout model mapping: chinese (PP-DocLayout-S), default (PubLayNet), cdla - Add LayoutModelSelector component and zh-TW translations - Fix "default" model behavior with sentinel value for PubLayNet - Add gap filling service for OCR track coverage improvement - Add PP-Structure debug utilities - Archive completed/incomplete proposals: - add-ocr-track-gap-filling (complete) - fix-ocr-track-table-rendering (incomplete) - simplify-ppstructure-model-selection (22/25 tasks) - Add new layout model tests, archive old PP-Structure param tests - Update OpenSpec ocr-processing spec with layout model requirements 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2.2 KiB
2.2 KiB
Change: Simplify PP-StructureV3 Configuration with Layout Model Selection
Why
Current PP-StructureV3 parameter adjustment UI exposes 7 technical ML parameters (thresholds, ratios, merge modes) that are difficult for end users to understand. Meanwhile, switching to a different layout detection model (e.g., CDLA-trained models for Chinese documents) would have a much greater impact on OCR quality than fine-tuning these parameters.
Problems with current approach:
- Users don't understand what
layout_detection_thresholdortext_det_unclip_ratiomean - Wrong parameter values can make OCR results worse
- The default model (PubLayNet-based) is optimized for English academic papers, not Chinese business documents
- Model selection is far more impactful than parameter tuning
What Changes
Backend Changes
- REMOVED: API parameter
pp_structure_paramsfrom task start endpoint - ADDED: New API parameter
layout_modelwith predefined options:"default"- Standard model (PubLayNet-based, for English documents)"chinese"- PP-DocLayout-S model (for Chinese documents, forms, contracts)"cdla"- CDLA model (alternative Chinese document layout model)
- MODIFIED: PP-StructureV3 initialization uses
layout_detection_model_namebased on selection - Keep fine-tuning parameters in backend
config.pywith optimized defaults
Frontend Changes
- REMOVED:
PPStructureParams.tsxcomponent (slider/dropdown UI for 7 parameters) - ADDED: Simple radio button/dropdown for layout model selection with clear descriptions
- MODIFIED: Task start request body to send
layout_modelinstead ofpp_structure_params
API Changes
- BREAKING: Remove
pp_structure_paramsfromPOST /api/v2/tasks/{task_id}/start - ADDED: New optional parameter
layout_model: "default" | "chinese" | "cdla"
Impact
- Affected specs:
ocr-processing - Affected code:
- Backend:
app/routers/tasks.py,app/services/ocr_service.py,app/core/config.py - Frontend:
src/components/PPStructureParams.tsx(remove),src/types/apiV2.ts, task start form
- Backend:
- Breaking change: Clients using
pp_structure_paramswill need to migrate tolayout_model - User impact: Simpler UI, better default OCR quality for Chinese documents