egg/OCR

Files

egg 59206a6ab8 feat: simplify layout model selection and archive proposals

Changes:
- Replace PP-Structure 7-slider parameter UI with simple 3-option layout model selector
- Add layout model mapping: chinese (PP-DocLayout-S), default (PubLayNet), cdla
- Add LayoutModelSelector component and zh-TW translations
- Fix "default" model behavior with sentinel value for PubLayNet
- Add gap filling service for OCR track coverage improvement
- Add PP-Structure debug utilities
- Archive completed/incomplete proposals:
  - add-ocr-track-gap-filling (complete)
  - fix-ocr-track-table-rendering (incomplete)
  - simplify-ppstructure-model-selection (22/25 tasks)
- Add new layout model tests, archive old PP-Structure param tests
- Update OpenSpec ocr-processing spec with layout model requirements

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-27 13:27:00 +08:00

2.2 KiB

Raw Blame History

Change: Simplify PP-StructureV3 Configuration with Layout Model Selection

Why

Current PP-StructureV3 parameter adjustment UI exposes 7 technical ML parameters (thresholds, ratios, merge modes) that are difficult for end users to understand. Meanwhile, switching to a different layout detection model (e.g., CDLA-trained models for Chinese documents) would have a much greater impact on OCR quality than fine-tuning these parameters.

Problems with current approach:

Users don't understand what layout_detection_threshold or text_det_unclip_ratio mean
Wrong parameter values can make OCR results worse
The default model (PubLayNet-based) is optimized for English academic papers, not Chinese business documents
Model selection is far more impactful than parameter tuning

What Changes

Backend Changes

REMOVED: API parameter pp_structure_params from task start endpoint
ADDED: New API parameter layout_model with predefined options:
- "default" - Standard model (PubLayNet-based, for English documents)
- "chinese" - PP-DocLayout-S model (for Chinese documents, forms, contracts)
- "cdla" - CDLA model (alternative Chinese document layout model)
MODIFIED: PP-StructureV3 initialization uses layout_detection_model_name based on selection
Keep fine-tuning parameters in backend config.py with optimized defaults

Frontend Changes

REMOVED: PPStructureParams.tsx component (slider/dropdown UI for 7 parameters)
ADDED: Simple radio button/dropdown for layout model selection with clear descriptions
MODIFIED: Task start request body to send layout_model instead of pp_structure_params

API Changes

BREAKING: Remove pp_structure_params from POST /api/v2/tasks/{task_id}/start
ADDED: New optional parameter layout_model: "default" | "chinese" | "cdla"

Impact

Affected specs: ocr-processing
Affected code:
- Backend: app/routers/tasks.py, app/services/ocr_service.py, app/core/config.py
- Frontend: src/components/PPStructureParams.tsx (remove), src/types/apiV2.ts, task start form
Breaking change: Clients using pp_structure_params will need to migrate to layout_model
User impact: Simpler UI, better default OCR quality for Chinese documents

2.2 KiB Raw Blame History