Backup commit before executing remove-unused-code proposal. This includes all pending changes and new features. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
117 lines
4.3 KiB
Markdown
117 lines
4.3 KiB
Markdown
# Proposal: Add OCR Processing Presets and Parameter Configuration
|
|
|
|
## Summary
|
|
|
|
Add frontend UI for configuring PP-Structure OCR processing parameters with document-type presets and advanced parameter tuning. This addresses the root cause of table over-detection by allowing users to select appropriate processing modes for their document types.
|
|
|
|
## Problem Statement
|
|
|
|
Currently, PP-Structure's table parsing is too aggressive for many document types:
|
|
1. **Layout detection** misclassifies structured text (e.g., datasheet right columns) as tables
|
|
2. **Table cell parsing** over-segments these regions, causing "cell explosion"
|
|
3. **Post-processing patches** (cell validation, gap filling, table rebuilder) try to fix symptoms but don't address root cause
|
|
4. **No user control** - all settings are hardcoded in backend config.py
|
|
|
|
## Proposed Solution
|
|
|
|
### 1. Document Type Presets (Simple Mode)
|
|
|
|
Provide predefined configurations for common document types:
|
|
|
|
| Preset | Description | Table Parsing | Layout Threshold | Use Case |
|
|
|--------|-------------|---------------|------------------|----------|
|
|
| `text_heavy` | Documents with mostly paragraphs | disabled | 0.7 | Reports, articles, manuals |
|
|
| `datasheet` | Technical datasheets with tables/specs | conservative | 0.65 | Product specs, TDS |
|
|
| `table_heavy` | Documents with many tables | full | 0.5 | Financial reports, spreadsheets |
|
|
| `form` | Forms with fields | conservative | 0.6 | Applications, surveys |
|
|
| `mixed` | Mixed content documents | classification_only | 0.55 | General documents |
|
|
| `custom` | User-defined settings | user-defined | user-defined | Advanced users |
|
|
|
|
### 2. Advanced Parameter Panel (Expert Mode)
|
|
|
|
Expose all PP-Structure parameters for fine-tuning:
|
|
|
|
**Table Processing:**
|
|
- `table_parsing_mode`: full / conservative / classification_only / disabled
|
|
- `table_layout_threshold`: 0.0 - 1.0 (higher = stricter table detection)
|
|
- `enable_wired_table`: true / false
|
|
- `enable_wireless_table`: true / false
|
|
- `wired_table_model`: model selection
|
|
- `wireless_table_model`: model selection
|
|
|
|
**Layout Detection:**
|
|
- `layout_detection_model`: model selection
|
|
- `layout_threshold`: 0.0 - 1.0
|
|
- `layout_nms_threshold`: 0.0 - 1.0
|
|
- `layout_merge_mode`: large / small / union
|
|
|
|
**Preprocessing:**
|
|
- `use_doc_orientation_classify`: true / false
|
|
- `use_doc_unwarping`: true / false
|
|
- `use_textline_orientation`: true / false
|
|
|
|
**Other Recognition:**
|
|
- `enable_chart_recognition`: true / false
|
|
- `enable_formula_recognition`: true / false
|
|
- `enable_seal_recognition`: true / false
|
|
|
|
### 3. API Endpoint
|
|
|
|
Add endpoint to accept processing configuration:
|
|
|
|
```
|
|
POST /api/v2/tasks
|
|
{
|
|
"file": ...,
|
|
"processing_track": "ocr",
|
|
"ocr_preset": "datasheet", // OR
|
|
"ocr_config": {
|
|
"table_parsing_mode": "conservative",
|
|
"table_layout_threshold": 0.65,
|
|
...
|
|
}
|
|
}
|
|
```
|
|
|
|
### 4. Frontend UI Components
|
|
|
|
1. **Preset Selector**: Dropdown with document type icons and descriptions
|
|
2. **Advanced Toggle**: Expand/collapse for parameter panel
|
|
3. **Parameter Groups**: Collapsible sections for table/layout/preprocessing
|
|
4. **Real-time Preview**: Show expected behavior based on settings
|
|
|
|
## Benefits
|
|
|
|
1. **Root cause fix**: Address table over-detection at the source
|
|
2. **User empowerment**: Users can optimize for their specific documents
|
|
3. **No patches needed**: Clean PP-Structure output without post-processing hacks
|
|
4. **Iterative improvement**: Users can fine-tune and share working configurations
|
|
|
|
## Scope
|
|
|
|
- Backend: API endpoint, preset definitions, parameter validation
|
|
- Frontend: UI components for preset selection and parameter tuning
|
|
- No changes to PP-Structure core - only configuration
|
|
|
|
## Success Criteria
|
|
|
|
1. Users can select appropriate preset for document type
|
|
2. OCR output matches document reality without post-processing patches
|
|
3. Advanced users can fine-tune all PP-Structure parameters
|
|
4. Configuration can be saved and reused
|
|
|
|
## Risks & Mitigations
|
|
|
|
| Risk | Mitigation |
|
|
|------|------------|
|
|
| Users overwhelmed by parameters | Default to presets, hide advanced panel |
|
|
| Wrong preset selection | Provide visual examples for each preset |
|
|
| Breaking changes | Keep backward compatibility with defaults |
|
|
|
|
## Timeline
|
|
|
|
Phase 1: Backend API and presets (2-3 days)
|
|
Phase 2: Frontend preset selector (1-2 days)
|
|
Phase 3: Advanced parameter panel (2-3 days)
|
|
Phase 4: Documentation and testing (1 day)
|