egg/OCR

Files

egg 940a406dce chore: backup before code cleanup

Backup commit before executing remove-unused-code proposal.
This includes all pending changes and new features.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2025-12-11 11:55:39 +08:00

4.3 KiB

Raw Blame History

Proposal: Add OCR Processing Presets and Parameter Configuration

Summary

Add frontend UI for configuring PP-Structure OCR processing parameters with document-type presets and advanced parameter tuning. This addresses the root cause of table over-detection by allowing users to select appropriate processing modes for their document types.

Problem Statement

Currently, PP-Structure's table parsing is too aggressive for many document types:

Layout detection misclassifies structured text (e.g., datasheet right columns) as tables
Table cell parsing over-segments these regions, causing "cell explosion"
Post-processing patches (cell validation, gap filling, table rebuilder) try to fix symptoms but don't address root cause
No user control - all settings are hardcoded in backend config.py

Proposed Solution

1. Document Type Presets (Simple Mode)

Provide predefined configurations for common document types:

Preset	Description	Table Parsing	Layout Threshold	Use Case
`text_heavy`	Documents with mostly paragraphs	disabled	0.7	Reports, articles, manuals
`datasheet`	Technical datasheets with tables/specs	conservative	0.65	Product specs, TDS
`table_heavy`	Documents with many tables	full	0.5	Financial reports, spreadsheets
`form`	Forms with fields	conservative	0.6	Applications, surveys
`mixed`	Mixed content documents	classification_only	0.55	General documents
`custom`	User-defined settings	user-defined	user-defined	Advanced users

2. Advanced Parameter Panel (Expert Mode)

Expose all PP-Structure parameters for fine-tuning:

Table Processing:

table_parsing_mode: full / conservative / classification_only / disabled
table_layout_threshold: 0.0 - 1.0 (higher = stricter table detection)
enable_wired_table: true / false
enable_wireless_table: true / false
wired_table_model: model selection
wireless_table_model: model selection

Layout Detection:

layout_detection_model: model selection
layout_threshold: 0.0 - 1.0
layout_nms_threshold: 0.0 - 1.0
layout_merge_mode: large / small / union

Preprocessing:

use_doc_orientation_classify: true / false
use_doc_unwarping: true / false
use_textline_orientation: true / false

Other Recognition:

enable_chart_recognition: true / false
enable_formula_recognition: true / false
enable_seal_recognition: true / false

3. API Endpoint

Add endpoint to accept processing configuration:

POST /api/v2/tasks
{
  "file": ...,
  "processing_track": "ocr",
  "ocr_preset": "datasheet",  // OR
  "ocr_config": {
    "table_parsing_mode": "conservative",
    "table_layout_threshold": 0.65,
    ...
  }
}

4. Frontend UI Components

Preset Selector: Dropdown with document type icons and descriptions
Advanced Toggle: Expand/collapse for parameter panel
Parameter Groups: Collapsible sections for table/layout/preprocessing
Real-time Preview: Show expected behavior based on settings

Benefits

Root cause fix: Address table over-detection at the source
User empowerment: Users can optimize for their specific documents
No patches needed: Clean PP-Structure output without post-processing hacks
Iterative improvement: Users can fine-tune and share working configurations

Scope

Backend: API endpoint, preset definitions, parameter validation
Frontend: UI components for preset selection and parameter tuning
No changes to PP-Structure core - only configuration

Success Criteria

Users can select appropriate preset for document type
OCR output matches document reality without post-processing patches
Advanced users can fine-tune all PP-Structure parameters
Configuration can be saved and reused

Risks & Mitigations

Risk	Mitigation
Users overwhelmed by parameters	Default to presets, hide advanced panel
Wrong preset selection	Provide visual examples for each preset
Breaking changes	Keep backward compatibility with defaults

Timeline

Phase 1: Backend API and presets (2-3 days) Phase 2: Frontend preset selector (1-2 days) Phase 3: Advanced parameter panel (2-3 days) Phase 4: Documentation and testing (1 day)

4.3 KiB Raw Blame History