Files
OCR/openspec/changes/archive/2025-11-27-simplify-ppstructure-model-selection/proposal.md
egg 59206a6ab8 feat: simplify layout model selection and archive proposals
Changes:
- Replace PP-Structure 7-slider parameter UI with simple 3-option layout model selector
- Add layout model mapping: chinese (PP-DocLayout-S), default (PubLayNet), cdla
- Add LayoutModelSelector component and zh-TW translations
- Fix "default" model behavior with sentinel value for PubLayNet
- Add gap filling service for OCR track coverage improvement
- Add PP-Structure debug utilities
- Archive completed/incomplete proposals:
  - add-ocr-track-gap-filling (complete)
  - fix-ocr-track-table-rendering (incomplete)
  - simplify-ppstructure-model-selection (22/25 tasks)
- Add new layout model tests, archive old PP-Structure param tests
- Update OpenSpec ocr-processing spec with layout model requirements

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-27 13:27:00 +08:00

2.2 KiB

Change: Simplify PP-StructureV3 Configuration with Layout Model Selection

Why

Current PP-StructureV3 parameter adjustment UI exposes 7 technical ML parameters (thresholds, ratios, merge modes) that are difficult for end users to understand. Meanwhile, switching to a different layout detection model (e.g., CDLA-trained models for Chinese documents) would have a much greater impact on OCR quality than fine-tuning these parameters.

Problems with current approach:

  • Users don't understand what layout_detection_threshold or text_det_unclip_ratio mean
  • Wrong parameter values can make OCR results worse
  • The default model (PubLayNet-based) is optimized for English academic papers, not Chinese business documents
  • Model selection is far more impactful than parameter tuning

What Changes

Backend Changes

  • REMOVED: API parameter pp_structure_params from task start endpoint
  • ADDED: New API parameter layout_model with predefined options:
    • "default" - Standard model (PubLayNet-based, for English documents)
    • "chinese" - PP-DocLayout-S model (for Chinese documents, forms, contracts)
    • "cdla" - CDLA model (alternative Chinese document layout model)
  • MODIFIED: PP-StructureV3 initialization uses layout_detection_model_name based on selection
  • Keep fine-tuning parameters in backend config.py with optimized defaults

Frontend Changes

  • REMOVED: PPStructureParams.tsx component (slider/dropdown UI for 7 parameters)
  • ADDED: Simple radio button/dropdown for layout model selection with clear descriptions
  • MODIFIED: Task start request body to send layout_model instead of pp_structure_params

API Changes

  • BREAKING: Remove pp_structure_params from POST /api/v2/tasks/{task_id}/start
  • ADDED: New optional parameter layout_model: "default" | "chinese" | "cdla"

Impact

  • Affected specs: ocr-processing
  • Affected code:
    • Backend: app/routers/tasks.py, app/services/ocr_service.py, app/core/config.py
    • Frontend: src/components/PPStructureParams.tsx (remove), src/types/apiV2.ts, task start form
  • Breaking change: Clients using pp_structure_params will need to migrate to layout_model
  • User impact: Simpler UI, better default OCR quality for Chinese documents