Files
OCR/openspec/changes/archive/2025-12-10-add-ocr-processing-presets/proposal.md
egg 940a406dce chore: backup before code cleanup
Backup commit before executing remove-unused-code proposal.
This includes all pending changes and new features.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 11:55:39 +08:00

117 lines
4.3 KiB
Markdown

# Proposal: Add OCR Processing Presets and Parameter Configuration
## Summary
Add frontend UI for configuring PP-Structure OCR processing parameters with document-type presets and advanced parameter tuning. This addresses the root cause of table over-detection by allowing users to select appropriate processing modes for their document types.
## Problem Statement
Currently, PP-Structure's table parsing is too aggressive for many document types:
1. **Layout detection** misclassifies structured text (e.g., datasheet right columns) as tables
2. **Table cell parsing** over-segments these regions, causing "cell explosion"
3. **Post-processing patches** (cell validation, gap filling, table rebuilder) try to fix symptoms but don't address root cause
4. **No user control** - all settings are hardcoded in backend config.py
## Proposed Solution
### 1. Document Type Presets (Simple Mode)
Provide predefined configurations for common document types:
| Preset | Description | Table Parsing | Layout Threshold | Use Case |
|--------|-------------|---------------|------------------|----------|
| `text_heavy` | Documents with mostly paragraphs | disabled | 0.7 | Reports, articles, manuals |
| `datasheet` | Technical datasheets with tables/specs | conservative | 0.65 | Product specs, TDS |
| `table_heavy` | Documents with many tables | full | 0.5 | Financial reports, spreadsheets |
| `form` | Forms with fields | conservative | 0.6 | Applications, surveys |
| `mixed` | Mixed content documents | classification_only | 0.55 | General documents |
| `custom` | User-defined settings | user-defined | user-defined | Advanced users |
### 2. Advanced Parameter Panel (Expert Mode)
Expose all PP-Structure parameters for fine-tuning:
**Table Processing:**
- `table_parsing_mode`: full / conservative / classification_only / disabled
- `table_layout_threshold`: 0.0 - 1.0 (higher = stricter table detection)
- `enable_wired_table`: true / false
- `enable_wireless_table`: true / false
- `wired_table_model`: model selection
- `wireless_table_model`: model selection
**Layout Detection:**
- `layout_detection_model`: model selection
- `layout_threshold`: 0.0 - 1.0
- `layout_nms_threshold`: 0.0 - 1.0
- `layout_merge_mode`: large / small / union
**Preprocessing:**
- `use_doc_orientation_classify`: true / false
- `use_doc_unwarping`: true / false
- `use_textline_orientation`: true / false
**Other Recognition:**
- `enable_chart_recognition`: true / false
- `enable_formula_recognition`: true / false
- `enable_seal_recognition`: true / false
### 3. API Endpoint
Add endpoint to accept processing configuration:
```
POST /api/v2/tasks
{
"file": ...,
"processing_track": "ocr",
"ocr_preset": "datasheet", // OR
"ocr_config": {
"table_parsing_mode": "conservative",
"table_layout_threshold": 0.65,
...
}
}
```
### 4. Frontend UI Components
1. **Preset Selector**: Dropdown with document type icons and descriptions
2. **Advanced Toggle**: Expand/collapse for parameter panel
3. **Parameter Groups**: Collapsible sections for table/layout/preprocessing
4. **Real-time Preview**: Show expected behavior based on settings
## Benefits
1. **Root cause fix**: Address table over-detection at the source
2. **User empowerment**: Users can optimize for their specific documents
3. **No patches needed**: Clean PP-Structure output without post-processing hacks
4. **Iterative improvement**: Users can fine-tune and share working configurations
## Scope
- Backend: API endpoint, preset definitions, parameter validation
- Frontend: UI components for preset selection and parameter tuning
- No changes to PP-Structure core - only configuration
## Success Criteria
1. Users can select appropriate preset for document type
2. OCR output matches document reality without post-processing patches
3. Advanced users can fine-tune all PP-Structure parameters
4. Configuration can be saved and reused
## Risks & Mitigations
| Risk | Mitigation |
|------|------------|
| Users overwhelmed by parameters | Default to presets, hide advanced panel |
| Wrong preset selection | Provide visual examples for each preset |
| Breaking changes | Keep backward compatibility with defaults |
## Timeline
Phase 1: Backend API and presets (2-3 days)
Phase 2: Frontend preset selector (1-2 days)
Phase 3: Advanced parameter panel (2-3 days)
Phase 4: Documentation and testing (1 day)