# Proposal: Add OCR Processing Presets and Parameter Configuration ## Summary Add frontend UI for configuring PP-Structure OCR processing parameters with document-type presets and advanced parameter tuning. This addresses the root cause of table over-detection by allowing users to select appropriate processing modes for their document types. ## Problem Statement Currently, PP-Structure's table parsing is too aggressive for many document types: 1. **Layout detection** misclassifies structured text (e.g., datasheet right columns) as tables 2. **Table cell parsing** over-segments these regions, causing "cell explosion" 3. **Post-processing patches** (cell validation, gap filling, table rebuilder) try to fix symptoms but don't address root cause 4. **No user control** - all settings are hardcoded in backend config.py ## Proposed Solution ### 1. Document Type Presets (Simple Mode) Provide predefined configurations for common document types: | Preset | Description | Table Parsing | Layout Threshold | Use Case | |--------|-------------|---------------|------------------|----------| | `text_heavy` | Documents with mostly paragraphs | disabled | 0.7 | Reports, articles, manuals | | `datasheet` | Technical datasheets with tables/specs | conservative | 0.65 | Product specs, TDS | | `table_heavy` | Documents with many tables | full | 0.5 | Financial reports, spreadsheets | | `form` | Forms with fields | conservative | 0.6 | Applications, surveys | | `mixed` | Mixed content documents | classification_only | 0.55 | General documents | | `custom` | User-defined settings | user-defined | user-defined | Advanced users | ### 2. Advanced Parameter Panel (Expert Mode) Expose all PP-Structure parameters for fine-tuning: **Table Processing:** - `table_parsing_mode`: full / conservative / classification_only / disabled - `table_layout_threshold`: 0.0 - 1.0 (higher = stricter table detection) - `enable_wired_table`: true / false - `enable_wireless_table`: true / false - `wired_table_model`: model selection - `wireless_table_model`: model selection **Layout Detection:** - `layout_detection_model`: model selection - `layout_threshold`: 0.0 - 1.0 - `layout_nms_threshold`: 0.0 - 1.0 - `layout_merge_mode`: large / small / union **Preprocessing:** - `use_doc_orientation_classify`: true / false - `use_doc_unwarping`: true / false - `use_textline_orientation`: true / false **Other Recognition:** - `enable_chart_recognition`: true / false - `enable_formula_recognition`: true / false - `enable_seal_recognition`: true / false ### 3. API Endpoint Add endpoint to accept processing configuration: ``` POST /api/v2/tasks { "file": ..., "processing_track": "ocr", "ocr_preset": "datasheet", // OR "ocr_config": { "table_parsing_mode": "conservative", "table_layout_threshold": 0.65, ... } } ``` ### 4. Frontend UI Components 1. **Preset Selector**: Dropdown with document type icons and descriptions 2. **Advanced Toggle**: Expand/collapse for parameter panel 3. **Parameter Groups**: Collapsible sections for table/layout/preprocessing 4. **Real-time Preview**: Show expected behavior based on settings ## Benefits 1. **Root cause fix**: Address table over-detection at the source 2. **User empowerment**: Users can optimize for their specific documents 3. **No patches needed**: Clean PP-Structure output without post-processing hacks 4. **Iterative improvement**: Users can fine-tune and share working configurations ## Scope - Backend: API endpoint, preset definitions, parameter validation - Frontend: UI components for preset selection and parameter tuning - No changes to PP-Structure core - only configuration ## Success Criteria 1. Users can select appropriate preset for document type 2. OCR output matches document reality without post-processing patches 3. Advanced users can fine-tune all PP-Structure parameters 4. Configuration can be saved and reused ## Risks & Mitigations | Risk | Mitigation | |------|------------| | Users overwhelmed by parameters | Default to presets, hide advanced panel | | Wrong preset selection | Provide visual examples for each preset | | Breaking changes | Keep backward compatibility with defaults | ## Timeline Phase 1: Backend API and presets (2-3 days) Phase 2: Frontend preset selector (1-2 days) Phase 3: Advanced parameter panel (2-3 days) Phase 4: Documentation and testing (1 day)