# Proposal: Add OCR Processing Presets and Parameter Configuration

## Summary

Add frontend UI for configuring PP-Structure OCR processing parameters with document-type presets and advanced parameter tuning. This addresses the root cause of table over-detection by allowing users to select appropriate processing modes for their document types.

## Problem Statement

Currently, PP-Structure's table parsing is too aggressive for many document types:
1. **Layout detection** misclassifies structured text (e.g., datasheet right columns) as tables
2. **Table cell parsing** over-segments these regions, causing "cell explosion"
3. **Post-processing patches** (cell validation, gap filling, table rebuilder) try to fix symptoms but don't address root cause
4. **No user control** - all settings are hardcoded in backend config.py

## Proposed Solution

### 1. Document Type Presets (Simple Mode)

Provide predefined configurations for common document types:

| Preset | Description | Table Parsing | Layout Threshold | Use Case |
|--------|-------------|---------------|------------------|----------|
| `text_heavy` | Documents with mostly paragraphs | disabled | 0.7 | Reports, articles, manuals |
| `datasheet` | Technical datasheets with tables/specs | conservative | 0.65 | Product specs, TDS |
| `table_heavy` | Documents with many tables | full | 0.5 | Financial reports, spreadsheets |
| `form` | Forms with fields | conservative | 0.6 | Applications, surveys |
| `mixed` | Mixed content documents | classification_only | 0.55 | General documents |
| `custom` | User-defined settings | user-defined | user-defined | Advanced users |

### 2. Advanced Parameter Panel (Expert Mode)

Expose all PP-Structure parameters for fine-tuning:

**Table Processing:**
- `table_parsing_mode`: full / conservative / classification_only / disabled
- `table_layout_threshold`: 0.0 - 1.0 (higher = stricter table detection)
- `enable_wired_table`: true / false
- `enable_wireless_table`: true / false
- `wired_table_model`: model selection
- `wireless_table_model`: model selection

**Layout Detection:**
- `layout_detection_model`: model selection
- `layout_threshold`: 0.0 - 1.0
- `layout_nms_threshold`: 0.0 - 1.0
- `layout_merge_mode`: large / small / union

**Preprocessing:**
- `use_doc_orientation_classify`: true / false
- `use_doc_unwarping`: true / false
- `use_textline_orientation`: true / false

**Other Recognition:**
- `enable_chart_recognition`: true / false
- `enable_formula_recognition`: true / false
- `enable_seal_recognition`: true / false

### 3. API Endpoint

Add endpoint to accept processing configuration:

```
POST /api/v2/tasks
{
  "file": ...,
  "processing_track": "ocr",
  "ocr_preset": "datasheet",  // OR
  "ocr_config": {
    "table_parsing_mode": "conservative",
    "table_layout_threshold": 0.65,
    ...
  }
}
```

### 4. Frontend UI Components

1. **Preset Selector**: Dropdown with document type icons and descriptions
2. **Advanced Toggle**: Expand/collapse for parameter panel
3. **Parameter Groups**: Collapsible sections for table/layout/preprocessing
4. **Real-time Preview**: Show expected behavior based on settings

## Benefits

1. **Root cause fix**: Address table over-detection at the source
2. **User empowerment**: Users can optimize for their specific documents
3. **No patches needed**: Clean PP-Structure output without post-processing hacks
4. **Iterative improvement**: Users can fine-tune and share working configurations

## Scope

- Backend: API endpoint, preset definitions, parameter validation
- Frontend: UI components for preset selection and parameter tuning
- No changes to PP-Structure core - only configuration

## Success Criteria

1. Users can select appropriate preset for document type
2. OCR output matches document reality without post-processing patches
3. Advanced users can fine-tune all PP-Structure parameters
4. Configuration can be saved and reused

## Risks & Mitigations

| Risk | Mitigation |
|------|------------|
| Users overwhelmed by parameters | Default to presets, hide advanced panel |
| Wrong preset selection | Provide visual examples for each preset |
| Breaking changes | Keep backward compatibility with defaults |

## Timeline

Phase 1: Backend API and presets (2-3 days)
Phase 2: Frontend preset selector (1-2 days)
Phase 3: Advanced parameter panel (2-3 days)
Phase 4: Documentation and testing (1 day)