Backup commit before executing remove-unused-code proposal. This includes all pending changes and new features. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
4.3 KiB
4.3 KiB
Proposal: Add OCR Processing Presets and Parameter Configuration
Summary
Add frontend UI for configuring PP-Structure OCR processing parameters with document-type presets and advanced parameter tuning. This addresses the root cause of table over-detection by allowing users to select appropriate processing modes for their document types.
Problem Statement
Currently, PP-Structure's table parsing is too aggressive for many document types:
- Layout detection misclassifies structured text (e.g., datasheet right columns) as tables
- Table cell parsing over-segments these regions, causing "cell explosion"
- Post-processing patches (cell validation, gap filling, table rebuilder) try to fix symptoms but don't address root cause
- No user control - all settings are hardcoded in backend config.py
Proposed Solution
1. Document Type Presets (Simple Mode)
Provide predefined configurations for common document types:
| Preset | Description | Table Parsing | Layout Threshold | Use Case |
|---|---|---|---|---|
text_heavy |
Documents with mostly paragraphs | disabled | 0.7 | Reports, articles, manuals |
datasheet |
Technical datasheets with tables/specs | conservative | 0.65 | Product specs, TDS |
table_heavy |
Documents with many tables | full | 0.5 | Financial reports, spreadsheets |
form |
Forms with fields | conservative | 0.6 | Applications, surveys |
mixed |
Mixed content documents | classification_only | 0.55 | General documents |
custom |
User-defined settings | user-defined | user-defined | Advanced users |
2. Advanced Parameter Panel (Expert Mode)
Expose all PP-Structure parameters for fine-tuning:
Table Processing:
table_parsing_mode: full / conservative / classification_only / disabledtable_layout_threshold: 0.0 - 1.0 (higher = stricter table detection)enable_wired_table: true / falseenable_wireless_table: true / falsewired_table_model: model selectionwireless_table_model: model selection
Layout Detection:
layout_detection_model: model selectionlayout_threshold: 0.0 - 1.0layout_nms_threshold: 0.0 - 1.0layout_merge_mode: large / small / union
Preprocessing:
use_doc_orientation_classify: true / falseuse_doc_unwarping: true / falseuse_textline_orientation: true / false
Other Recognition:
enable_chart_recognition: true / falseenable_formula_recognition: true / falseenable_seal_recognition: true / false
3. API Endpoint
Add endpoint to accept processing configuration:
POST /api/v2/tasks
{
"file": ...,
"processing_track": "ocr",
"ocr_preset": "datasheet", // OR
"ocr_config": {
"table_parsing_mode": "conservative",
"table_layout_threshold": 0.65,
...
}
}
4. Frontend UI Components
- Preset Selector: Dropdown with document type icons and descriptions
- Advanced Toggle: Expand/collapse for parameter panel
- Parameter Groups: Collapsible sections for table/layout/preprocessing
- Real-time Preview: Show expected behavior based on settings
Benefits
- Root cause fix: Address table over-detection at the source
- User empowerment: Users can optimize for their specific documents
- No patches needed: Clean PP-Structure output without post-processing hacks
- Iterative improvement: Users can fine-tune and share working configurations
Scope
- Backend: API endpoint, preset definitions, parameter validation
- Frontend: UI components for preset selection and parameter tuning
- No changes to PP-Structure core - only configuration
Success Criteria
- Users can select appropriate preset for document type
- OCR output matches document reality without post-processing patches
- Advanced users can fine-tune all PP-Structure parameters
- Configuration can be saved and reused
Risks & Mitigations
| Risk | Mitigation |
|---|---|
| Users overwhelmed by parameters | Default to presets, hide advanced panel |
| Wrong preset selection | Provide visual examples for each preset |
| Breaking changes | Keep backward compatibility with defaults |
Timeline
Phase 1: Backend API and presets (2-3 days) Phase 2: Frontend preset selector (1-2 days) Phase 3: Advanced parameter panel (2-3 days) Phase 4: Documentation and testing (1 day)