egg/OCR

Files

egg 59206a6ab8 feat: simplify layout model selection and archive proposals

Changes:
- Replace PP-Structure 7-slider parameter UI with simple 3-option layout model selector
- Add layout model mapping: chinese (PP-DocLayout-S), default (PubLayNet), cdla
- Add LayoutModelSelector component and zh-TW translations
- Fix "default" model behavior with sentinel value for PubLayNet
- Add gap filling service for OCR track coverage improvement
- Add PP-Structure debug utilities
- Archive completed/incomplete proposals:
  - add-ocr-track-gap-filling (complete)
  - fix-ocr-track-table-rendering (incomplete)
  - simplify-ppstructure-model-selection (22/25 tasks)
- Add new layout model tests, archive old PP-Structure param tests
- Update OpenSpec ocr-processing spec with layout model requirements

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-27 13:27:00 +08:00

4.4 KiB

Raw Blame History

ocr-processing Specification Delta

REMOVED Requirements

Requirement: Frontend-Adjustable PP-StructureV3 Parameters

Reason: Complex ML parameters are difficult for end users to understand and tune. Model selection provides better UX and more significant quality improvements. Migration: Replace pp_structure_params API parameter with layout_model parameter.

Requirement: PP-StructureV3 Parameter UI Controls

Reason: Slider/dropdown UI for 7 technical parameters adds complexity without proportional benefit. Simple model selection is more user-friendly. Migration: Remove PPStructureParams.tsx component, add LayoutModelSelector.tsx component.

ADDED Requirements

Requirement: Layout Model Selection

The system SHALL allow users to select a layout detection model optimized for their document type, providing a simple choice between pre-configured models instead of manual parameter tuning.

Scenario: User selects Chinese document model

GIVEN a user is processing Chinese business documents (forms, contracts, invoices)
WHEN the user selects "Chinese Document Model" (PP-DocLayout-S)
THEN the OCR engine SHALL use the PP-DocLayout-S layout detection model
AND the model SHALL be optimized for 23 Chinese document element types
AND table and form detection accuracy SHALL be improved over the default model

Scenario: User selects standard model for English documents

GIVEN a user is processing English academic papers or reports
WHEN the user selects "Standard Model" (PubLayNet-based)
THEN the OCR engine SHALL use the default PubLayNet-based layout detection model
AND the model SHALL be optimized for English document layouts

Scenario: User selects CDLA model for specialized Chinese layout

GIVEN a user is processing Chinese documents with complex layouts
WHEN the user selects "CDLA Model"
THEN the OCR engine SHALL use the picodet_lcnet_x1_0_fgd_layout_cdla model
AND the model SHALL provide specialized Chinese document layout analysis

Scenario: Layout model is sent via API request

GIVEN a frontend application with model selection UI
WHEN the user starts task processing with a selected model

THEN the frontend SHALL send the model choice in the request body:

POST /api/v2/tasks/{task_id}/start
{
  "use_dual_track": true,
  "force_track": "ocr",
  "language": "ch",
  "layout_model": "chinese"
}

AND the backend SHALL configure PP-StructureV3 with the corresponding model

Scenario: Default model when not specified

GIVEN an API request without layout_model parameter
WHEN the task is started
THEN the system SHALL use "chinese" (PP-DocLayout-S) as the default model
AND processing SHALL work correctly without requiring model selection

Scenario: Invalid model name is rejected

GIVEN a request with an invalid layout_model value
WHEN the user sends layout_model: "invalid_model"
THEN the API SHALL return 422 Validation Error
AND provide a clear error message listing valid model options

Requirement: Layout Model Selection UI

The frontend SHALL provide a simple, user-friendly interface for selecting layout detection models with clear descriptions of each option.

Scenario: Model options are displayed with descriptions

GIVEN the model selection UI is displayed
WHEN the user views the available options
THEN the UI SHALL show the following options:
- "Chinese Document Model (Recommended)" - for Chinese forms, contracts, invoices
- "Standard Model" - for English academic papers, reports
- "CDLA Model" - for specialized Chinese layout analysis
AND each option SHALL have a brief description of its use case

Scenario: Chinese model is selected by default

GIVEN the user opens the task processing interface
WHEN the model selection is displayed
THEN "Chinese Document Model" SHALL be pre-selected as the default
AND the user MAY change the selection before starting processing

Scenario: Model selection is visible only for OCR track

GIVEN a document processing interface
WHEN the user selects processing track
THEN layout model selection SHALL be shown ONLY when OCR track is selected or auto-detected
AND SHALL be hidden for Direct track (which does not use PP-StructureV3)

4.4 KiB Raw Blame History