Files
OCR/openspec/changes/archive/2025-12-10-add-ocr-processing-presets/proposal.md
egg 940a406dce chore: backup before code cleanup
Backup commit before executing remove-unused-code proposal.
This includes all pending changes and new features.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 11:55:39 +08:00

4.3 KiB

Proposal: Add OCR Processing Presets and Parameter Configuration

Summary

Add frontend UI for configuring PP-Structure OCR processing parameters with document-type presets and advanced parameter tuning. This addresses the root cause of table over-detection by allowing users to select appropriate processing modes for their document types.

Problem Statement

Currently, PP-Structure's table parsing is too aggressive for many document types:

  1. Layout detection misclassifies structured text (e.g., datasheet right columns) as tables
  2. Table cell parsing over-segments these regions, causing "cell explosion"
  3. Post-processing patches (cell validation, gap filling, table rebuilder) try to fix symptoms but don't address root cause
  4. No user control - all settings are hardcoded in backend config.py

Proposed Solution

1. Document Type Presets (Simple Mode)

Provide predefined configurations for common document types:

Preset Description Table Parsing Layout Threshold Use Case
text_heavy Documents with mostly paragraphs disabled 0.7 Reports, articles, manuals
datasheet Technical datasheets with tables/specs conservative 0.65 Product specs, TDS
table_heavy Documents with many tables full 0.5 Financial reports, spreadsheets
form Forms with fields conservative 0.6 Applications, surveys
mixed Mixed content documents classification_only 0.55 General documents
custom User-defined settings user-defined user-defined Advanced users

2. Advanced Parameter Panel (Expert Mode)

Expose all PP-Structure parameters for fine-tuning:

Table Processing:

  • table_parsing_mode: full / conservative / classification_only / disabled
  • table_layout_threshold: 0.0 - 1.0 (higher = stricter table detection)
  • enable_wired_table: true / false
  • enable_wireless_table: true / false
  • wired_table_model: model selection
  • wireless_table_model: model selection

Layout Detection:

  • layout_detection_model: model selection
  • layout_threshold: 0.0 - 1.0
  • layout_nms_threshold: 0.0 - 1.0
  • layout_merge_mode: large / small / union

Preprocessing:

  • use_doc_orientation_classify: true / false
  • use_doc_unwarping: true / false
  • use_textline_orientation: true / false

Other Recognition:

  • enable_chart_recognition: true / false
  • enable_formula_recognition: true / false
  • enable_seal_recognition: true / false

3. API Endpoint

Add endpoint to accept processing configuration:

POST /api/v2/tasks
{
  "file": ...,
  "processing_track": "ocr",
  "ocr_preset": "datasheet",  // OR
  "ocr_config": {
    "table_parsing_mode": "conservative",
    "table_layout_threshold": 0.65,
    ...
  }
}

4. Frontend UI Components

  1. Preset Selector: Dropdown with document type icons and descriptions
  2. Advanced Toggle: Expand/collapse for parameter panel
  3. Parameter Groups: Collapsible sections for table/layout/preprocessing
  4. Real-time Preview: Show expected behavior based on settings

Benefits

  1. Root cause fix: Address table over-detection at the source
  2. User empowerment: Users can optimize for their specific documents
  3. No patches needed: Clean PP-Structure output without post-processing hacks
  4. Iterative improvement: Users can fine-tune and share working configurations

Scope

  • Backend: API endpoint, preset definitions, parameter validation
  • Frontend: UI components for preset selection and parameter tuning
  • No changes to PP-Structure core - only configuration

Success Criteria

  1. Users can select appropriate preset for document type
  2. OCR output matches document reality without post-processing patches
  3. Advanced users can fine-tune all PP-Structure parameters
  4. Configuration can be saved and reused

Risks & Mitigations

Risk Mitigation
Users overwhelmed by parameters Default to presets, hide advanced panel
Wrong preset selection Provide visual examples for each preset
Breaking changes Keep backward compatibility with defaults

Timeline

Phase 1: Backend API and presets (2-3 days) Phase 2: Frontend preset selector (1-2 days) Phase 3: Advanced parameter panel (2-3 days) Phase 4: Documentation and testing (1 day)