chore: backup before code cleanup
Backup commit before executing remove-unused-code proposal. This includes all pending changes and new features. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,116 @@
|
||||
# Proposal: Add OCR Processing Presets and Parameter Configuration
|
||||
|
||||
## Summary
|
||||
|
||||
Add frontend UI for configuring PP-Structure OCR processing parameters with document-type presets and advanced parameter tuning. This addresses the root cause of table over-detection by allowing users to select appropriate processing modes for their document types.
|
||||
|
||||
## Problem Statement
|
||||
|
||||
Currently, PP-Structure's table parsing is too aggressive for many document types:
|
||||
1. **Layout detection** misclassifies structured text (e.g., datasheet right columns) as tables
|
||||
2. **Table cell parsing** over-segments these regions, causing "cell explosion"
|
||||
3. **Post-processing patches** (cell validation, gap filling, table rebuilder) try to fix symptoms but don't address root cause
|
||||
4. **No user control** - all settings are hardcoded in backend config.py
|
||||
|
||||
## Proposed Solution
|
||||
|
||||
### 1. Document Type Presets (Simple Mode)
|
||||
|
||||
Provide predefined configurations for common document types:
|
||||
|
||||
| Preset | Description | Table Parsing | Layout Threshold | Use Case |
|
||||
|--------|-------------|---------------|------------------|----------|
|
||||
| `text_heavy` | Documents with mostly paragraphs | disabled | 0.7 | Reports, articles, manuals |
|
||||
| `datasheet` | Technical datasheets with tables/specs | conservative | 0.65 | Product specs, TDS |
|
||||
| `table_heavy` | Documents with many tables | full | 0.5 | Financial reports, spreadsheets |
|
||||
| `form` | Forms with fields | conservative | 0.6 | Applications, surveys |
|
||||
| `mixed` | Mixed content documents | classification_only | 0.55 | General documents |
|
||||
| `custom` | User-defined settings | user-defined | user-defined | Advanced users |
|
||||
|
||||
### 2. Advanced Parameter Panel (Expert Mode)
|
||||
|
||||
Expose all PP-Structure parameters for fine-tuning:
|
||||
|
||||
**Table Processing:**
|
||||
- `table_parsing_mode`: full / conservative / classification_only / disabled
|
||||
- `table_layout_threshold`: 0.0 - 1.0 (higher = stricter table detection)
|
||||
- `enable_wired_table`: true / false
|
||||
- `enable_wireless_table`: true / false
|
||||
- `wired_table_model`: model selection
|
||||
- `wireless_table_model`: model selection
|
||||
|
||||
**Layout Detection:**
|
||||
- `layout_detection_model`: model selection
|
||||
- `layout_threshold`: 0.0 - 1.0
|
||||
- `layout_nms_threshold`: 0.0 - 1.0
|
||||
- `layout_merge_mode`: large / small / union
|
||||
|
||||
**Preprocessing:**
|
||||
- `use_doc_orientation_classify`: true / false
|
||||
- `use_doc_unwarping`: true / false
|
||||
- `use_textline_orientation`: true / false
|
||||
|
||||
**Other Recognition:**
|
||||
- `enable_chart_recognition`: true / false
|
||||
- `enable_formula_recognition`: true / false
|
||||
- `enable_seal_recognition`: true / false
|
||||
|
||||
### 3. API Endpoint
|
||||
|
||||
Add endpoint to accept processing configuration:
|
||||
|
||||
```
|
||||
POST /api/v2/tasks
|
||||
{
|
||||
"file": ...,
|
||||
"processing_track": "ocr",
|
||||
"ocr_preset": "datasheet", // OR
|
||||
"ocr_config": {
|
||||
"table_parsing_mode": "conservative",
|
||||
"table_layout_threshold": 0.65,
|
||||
...
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Frontend UI Components
|
||||
|
||||
1. **Preset Selector**: Dropdown with document type icons and descriptions
|
||||
2. **Advanced Toggle**: Expand/collapse for parameter panel
|
||||
3. **Parameter Groups**: Collapsible sections for table/layout/preprocessing
|
||||
4. **Real-time Preview**: Show expected behavior based on settings
|
||||
|
||||
## Benefits
|
||||
|
||||
1. **Root cause fix**: Address table over-detection at the source
|
||||
2. **User empowerment**: Users can optimize for their specific documents
|
||||
3. **No patches needed**: Clean PP-Structure output without post-processing hacks
|
||||
4. **Iterative improvement**: Users can fine-tune and share working configurations
|
||||
|
||||
## Scope
|
||||
|
||||
- Backend: API endpoint, preset definitions, parameter validation
|
||||
- Frontend: UI components for preset selection and parameter tuning
|
||||
- No changes to PP-Structure core - only configuration
|
||||
|
||||
## Success Criteria
|
||||
|
||||
1. Users can select appropriate preset for document type
|
||||
2. OCR output matches document reality without post-processing patches
|
||||
3. Advanced users can fine-tune all PP-Structure parameters
|
||||
4. Configuration can be saved and reused
|
||||
|
||||
## Risks & Mitigations
|
||||
|
||||
| Risk | Mitigation |
|
||||
|------|------------|
|
||||
| Users overwhelmed by parameters | Default to presets, hide advanced panel |
|
||||
| Wrong preset selection | Provide visual examples for each preset |
|
||||
| Breaking changes | Keep backward compatibility with defaults |
|
||||
|
||||
## Timeline
|
||||
|
||||
Phase 1: Backend API and presets (2-3 days)
|
||||
Phase 2: Frontend preset selector (1-2 days)
|
||||
Phase 3: Advanced parameter panel (2-3 days)
|
||||
Phase 4: Documentation and testing (1 day)
|
||||
Reference in New Issue
Block a user