chore: backup before code cleanup

Backup commit before executing remove-unused-code proposal. This includes all pending changes and new features. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 11:55:39 +08:00
parent eff9b0bcd5
commit 940a406dce
58 changed files with 8226 additions and 175 deletions
--- a/openspec/changes/archive/2025-12-10-add-ocr-processing-presets/design.md
+++ b/openspec/changes/archive/2025-12-10-add-ocr-processing-presets/design.md
@@ -0,0 +1,227 @@
+# Design: OCR Processing Presets
+
+## Architecture Overview
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                        Frontend                                  │
+├─────────────────────────────────────────────────────────────────┤
+│  ┌──────────────────┐    ┌──────────────────────────────────┐   │
+│  │ Preset Selector  │───▶│  Advanced Parameter Panel        │   │
+│  │ (Simple Mode)    │    │  (Expert Mode)                   │   │
+│  └──────────────────┘    └──────────────────────────────────┘   │
+│           │                           │                          │
+│           └───────────┬───────────────┘                          │
+│                       ▼                                          │
+│              ┌─────────────────┐                                 │
+│              │ OCR Config JSON │                                 │
+│              └─────────────────┘                                 │
+└─────────────────────────────────────────────────────────────────┘
+                        │
+                        ▼ POST /api/v2/tasks
+┌─────────────────────────────────────────────────────────────────┐
+│                        Backend                                   │
+├─────────────────────────────────────────────────────────────────┤
+│  ┌──────────────────┐    ┌──────────────────────────────────┐   │
+│  │ Preset Resolver  │───▶│  OCR Config Validator            │   │
+│  └──────────────────┘    └──────────────────────────────────┘   │
+│           │                           │                          │
+│           └───────────┬───────────────┘                          │
+│                       ▼                                          │
+│              ┌─────────────────┐                                 │
+│              │ OCRService      │                                 │
+│              │ (with config)   │                                 │
+│              └─────────────────┘                                 │
+│                       │                                          │
+│                       ▼                                          │
+│              ┌─────────────────┐                                 │
+│              │ PPStructureV3   │                                 │
+│              │ (configured)    │                                 │
+│              └─────────────────┘                                 │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+## Data Models
+
+### OCRPreset Enum
+
+```python
+class OCRPreset(str, Enum):
+    TEXT_HEAVY = "text_heavy"       # Reports, articles, manuals
+    DATASHEET = "datasheet"         # Technical datasheets, TDS
+    TABLE_HEAVY = "table_heavy"     # Financial reports, spreadsheets
+    FORM = "form"                   # Applications, surveys
+    MIXED = "mixed"                 # General documents
+    CUSTOM = "custom"               # User-defined settings
+```
+
+### OCRConfig Model
+
+```python
+class OCRConfig(BaseModel):
+    # Table Processing
+    table_parsing_mode: Literal["full", "conservative", "classification_only", "disabled"] = "conservative"
+    table_layout_threshold: float = Field(default=0.65, ge=0.0, le=1.0)
+    enable_wired_table: bool = True
+    enable_wireless_table: bool = False  # Disabled by default (aggressive)
+
+    # Layout Detection
+    layout_detection_model: Optional[str] = "PP-DocLayout_plus-L"
+    layout_threshold: Optional[float] = Field(default=None, ge=0.0, le=1.0)
+    layout_nms_threshold: Optional[float] = Field(default=None, ge=0.0, le=1.0)
+    layout_merge_mode: Optional[Literal["large", "small", "union"]] = "union"
+
+    # Preprocessing
+    use_doc_orientation_classify: bool = True
+    use_doc_unwarping: bool = False  # Causes distortion
+    use_textline_orientation: bool = True
+
+    # Recognition Modules
+    enable_chart_recognition: bool = True
+    enable_formula_recognition: bool = True
+    enable_seal_recognition: bool = False
+    enable_region_detection: bool = True
+```
+
+### Preset Definitions
+
+```python
+PRESET_CONFIGS: Dict[OCRPreset, OCRConfig] = {
+    OCRPreset.TEXT_HEAVY: OCRConfig(
+        table_parsing_mode="disabled",
+        table_layout_threshold=0.7,
+        enable_wired_table=False,
+        enable_wireless_table=False,
+        enable_chart_recognition=False,
+        enable_formula_recognition=False,
+    ),
+    OCRPreset.DATASHEET: OCRConfig(
+        table_parsing_mode="conservative",
+        table_layout_threshold=0.65,
+        enable_wired_table=True,
+        enable_wireless_table=False,  # Key: disable aggressive wireless
+    ),
+    OCRPreset.TABLE_HEAVY: OCRConfig(
+        table_parsing_mode="full",
+        table_layout_threshold=0.5,
+        enable_wired_table=True,
+        enable_wireless_table=True,
+    ),
+    OCRPreset.FORM: OCRConfig(
+        table_parsing_mode="conservative",
+        table_layout_threshold=0.6,
+        enable_wired_table=True,
+        enable_wireless_table=False,
+    ),
+    OCRPreset.MIXED: OCRConfig(
+        table_parsing_mode="classification_only",
+        table_layout_threshold=0.55,
+    ),
+}
+```
+
+## API Design
+
+### Task Creation with OCR Config
+
+```http
+POST /api/v2/tasks
+Content-Type: multipart/form-data
+
+file: <binary>
+processing_track: "ocr"
+ocr_preset: "datasheet"  # Optional: use preset
+ocr_config: {            # Optional: override specific params
+  "table_layout_threshold": 0.7
+}
+```
+
+### Get Available Presets
+
+```http
+GET /api/v2/ocr/presets
+
+Response:
+{
+  "presets": [
+    {
+      "name": "datasheet",
+      "display_name": "Technical Datasheet",
+      "description": "Optimized for product specifications and technical documents",
+      "icon": "description",
+      "config": { ... }
+    },
+    ...
+  ]
+}
+```
+
+## Frontend Components
+
+### PresetSelector Component
+
+```tsx
+interface PresetSelectorProps {
+  value: OCRPreset;
+  onChange: (preset: OCRPreset) => void;
+  showAdvanced: boolean;
+  onToggleAdvanced: () => void;
+}
+
+// Visual preset cards with icons:
+// 📄 Text Heavy - Reports & Articles
+// 📊 Datasheet - Technical Documents
+// 📈 Table Heavy - Financial Reports
+// 📝 Form - Applications & Surveys
+// 📑 Mixed - General Documents
+// ⚙️ Custom - Expert Settings
+```
+
+### AdvancedConfigPanel Component
+
+```tsx
+interface AdvancedConfigPanelProps {
+  config: OCRConfig;
+  onChange: (config: Partial<OCRConfig>) => void;
+  preset: OCRPreset;  // To show which values differ from preset
+}
+
+// Sections:
+// - Table Processing (collapsed by default)
+// - Layout Detection (collapsed by default)
+// - Preprocessing (collapsed by default)
+// - Recognition Modules (collapsed by default)
+```
+
+## Key Design Decisions
+
+### 1. Preset as Default, Custom as Exception
+
+Users should start with presets. Only expose advanced panel when:
+- User explicitly clicks "Advanced Settings"
+- User selects "Custom" preset
+- User has previously saved custom settings
+
+### 2. Conservative Defaults
+
+All presets default to conservative settings:
+- `enable_wireless_table: false` (most aggressive, causes cell explosion)
+- `table_layout_threshold: 0.6+` (reduce false table detection)
+- `use_doc_unwarping: false` (causes distortion)
+
+### 3. Config Inheritance
+
+Custom config inherits from preset, only specified fields override:
+```python
+final_config = PRESET_CONFIGS[preset].copy()
+final_config.update(custom_overrides)
+```
+
+### 4. No Patch Behaviors
+
+All post-processing patches are disabled by default:
+- `cell_validation_enabled: false`
+- `gap_filling_enabled: false`
+- `table_content_rebuilder_enabled: false`
+
+Focus on getting PP-Structure output right with proper configuration.
--- a/openspec/changes/archive/2025-12-10-add-ocr-processing-presets/proposal.md
+++ b/openspec/changes/archive/2025-12-10-add-ocr-processing-presets/proposal.md
@@ -0,0 +1,116 @@
+# Proposal: Add OCR Processing Presets and Parameter Configuration
+
+## Summary
+
+Add frontend UI for configuring PP-Structure OCR processing parameters with document-type presets and advanced parameter tuning. This addresses the root cause of table over-detection by allowing users to select appropriate processing modes for their document types.
+
+## Problem Statement
+
+Currently, PP-Structure's table parsing is too aggressive for many document types:
+1. **Layout detection** misclassifies structured text (e.g., datasheet right columns) as tables
+2. **Table cell parsing** over-segments these regions, causing "cell explosion"
+3. **Post-processing patches** (cell validation, gap filling, table rebuilder) try to fix symptoms but don't address root cause
+4. **No user control** - all settings are hardcoded in backend config.py
+
+## Proposed Solution
+
+### 1. Document Type Presets (Simple Mode)
+
+Provide predefined configurations for common document types:
+
+| Preset | Description | Table Parsing | Layout Threshold | Use Case |
+|--------|-------------|---------------|------------------|----------|
+| `text_heavy` | Documents with mostly paragraphs | disabled | 0.7 | Reports, articles, manuals |
+| `datasheet` | Technical datasheets with tables/specs | conservative | 0.65 | Product specs, TDS |
+| `table_heavy` | Documents with many tables | full | 0.5 | Financial reports, spreadsheets |
+| `form` | Forms with fields | conservative | 0.6 | Applications, surveys |
+| `mixed` | Mixed content documents | classification_only | 0.55 | General documents |
+| `custom` | User-defined settings | user-defined | user-defined | Advanced users |
+
+### 2. Advanced Parameter Panel (Expert Mode)
+
+Expose all PP-Structure parameters for fine-tuning:
+
+**Table Processing:**
+- `table_parsing_mode`: full / conservative / classification_only / disabled
+- `table_layout_threshold`: 0.0 - 1.0 (higher = stricter table detection)
+- `enable_wired_table`: true / false
+- `enable_wireless_table`: true / false
+- `wired_table_model`: model selection
+- `wireless_table_model`: model selection
+
+**Layout Detection:**
+- `layout_detection_model`: model selection
+- `layout_threshold`: 0.0 - 1.0
+- `layout_nms_threshold`: 0.0 - 1.0
+- `layout_merge_mode`: large / small / union
+
+**Preprocessing:**
+- `use_doc_orientation_classify`: true / false
+- `use_doc_unwarping`: true / false
+- `use_textline_orientation`: true / false
+
+**Other Recognition:**
+- `enable_chart_recognition`: true / false
+- `enable_formula_recognition`: true / false
+- `enable_seal_recognition`: true / false
+
+### 3. API Endpoint
+
+Add endpoint to accept processing configuration:
+
+```
+POST /api/v2/tasks
+{
+  "file": ...,
+  "processing_track": "ocr",
+  "ocr_preset": "datasheet",  // OR
+  "ocr_config": {
+    "table_parsing_mode": "conservative",
+    "table_layout_threshold": 0.65,
+    ...
+  }
+}
+```
+
+### 4. Frontend UI Components
+
+1. **Preset Selector**: Dropdown with document type icons and descriptions
+2. **Advanced Toggle**: Expand/collapse for parameter panel
+3. **Parameter Groups**: Collapsible sections for table/layout/preprocessing
+4. **Real-time Preview**: Show expected behavior based on settings
+
+## Benefits
+
+1. **Root cause fix**: Address table over-detection at the source
+2. **User empowerment**: Users can optimize for their specific documents
+3. **No patches needed**: Clean PP-Structure output without post-processing hacks
+4. **Iterative improvement**: Users can fine-tune and share working configurations
+
+## Scope
+
+- Backend: API endpoint, preset definitions, parameter validation
+- Frontend: UI components for preset selection and parameter tuning
+- No changes to PP-Structure core - only configuration
+
+## Success Criteria
+
+1. Users can select appropriate preset for document type
+2. OCR output matches document reality without post-processing patches
+3. Advanced users can fine-tune all PP-Structure parameters
+4. Configuration can be saved and reused
+
+## Risks & Mitigations
+
+| Risk | Mitigation |
+|------|------------|
+| Users overwhelmed by parameters | Default to presets, hide advanced panel |
+| Wrong preset selection | Provide visual examples for each preset |
+| Breaking changes | Keep backward compatibility with defaults |
+
+## Timeline
+
+Phase 1: Backend API and presets (2-3 days)
+Phase 2: Frontend preset selector (1-2 days)
+Phase 3: Advanced parameter panel (2-3 days)
+Phase 4: Documentation and testing (1 day)
--- a/openspec/changes/archive/2025-12-10-add-ocr-processing-presets/specs/ocr-processing/spec.md
+++ b/openspec/changes/archive/2025-12-10-add-ocr-processing-presets/specs/ocr-processing/spec.md
@@ -0,0 +1,96 @@
+# OCR Processing - Delta Spec
+
+## ADDED Requirements
+
+### Requirement: REQ-OCR-PRESETS - Document Type Presets
+
+The system MUST provide predefined OCR processing configurations for common document types.
+
+Available presets:
+- `text_heavy`: Optimized for text-heavy documents (reports, articles)
+- `datasheet`: Optimized for technical datasheets
+- `table_heavy`: Optimized for documents with many tables
+- `form`: Optimized for forms and applications
+- `mixed`: Balanced configuration for mixed content
+- `custom`: User-defined configuration
+
+#### Scenario: User selects datasheet preset
+- Given a user uploading a technical datasheet
+- When they select the "datasheet" preset
+- Then the system applies conservative table parsing mode
+- And disables wireless table detection
+- And sets layout threshold to 0.65
+
+#### Scenario: User selects text_heavy preset
+- Given a user uploading a text-heavy report
+- When they select the "text_heavy" preset
+- Then the system disables table recognition
+- And focuses on text extraction
+
+### Requirement: REQ-OCR-PARAMS - Advanced Parameter Configuration
+
+The system MUST allow advanced users to configure individual PP-Structure parameters.
+
+Configurable parameters include:
+- Table parsing mode (full/conservative/classification_only/disabled)
+- Table layout threshold (0.0-1.0)
+- Wired/wireless table detection toggles
+- Layout detection model selection
+- Preprocessing options (orientation, unwarping, textline)
+- Recognition module toggles (chart, formula, seal)
+
+#### Scenario: User adjusts table layout threshold
+- Given a user experiencing table over-detection
+- When they increase table_layout_threshold to 0.7
+- Then fewer regions are classified as tables
+- And text regions are preserved correctly
+
+#### Scenario: User disables wireless table detection
+- Given a user processing a datasheet with cell explosion
+- When they disable enable_wireless_table
+- Then only bordered tables are detected
+- And structured text is not split into cells
+
+### Requirement: REQ-OCR-API - OCR Configuration API
+
+The task creation API MUST accept OCR configuration parameters.
+
+API accepts:
+- `ocr_preset`: Preset name to apply
+- `ocr_config`: Custom configuration object (overrides preset)
+
+#### Scenario: Create task with preset
+- Given an API request with ocr_preset="datasheet"
+- When the task is created
+- Then the datasheet preset configuration is applied
+- And the task processes with conservative table parsing
+
+#### Scenario: Create task with custom config
+- Given an API request with ocr_config containing custom values
+- When the task is created
+- Then the custom configuration overrides defaults
+- And the task uses the specified parameters
+
+## MODIFIED Requirements
+
+### Requirement: REQ-OCR-DEFAULTS - Default Processing Configuration
+
+The system default configuration MUST be conservative to prevent over-detection.
+
+Default values:
+- `table_parsing_mode`: "conservative"
+- `table_layout_threshold`: 0.65
+- `enable_wireless_table`: false
+- `use_doc_unwarping`: false
+
+Patch behaviors MUST be disabled by default:
+- `cell_validation_enabled`: false
+- `gap_filling_enabled`: false
+- `table_content_rebuilder_enabled`: false
+
+#### Scenario: New task uses conservative defaults
+- Given a task created without specifying OCR configuration
+- When the task is processed
+- Then conservative table parsing is used
+- And wireless table detection is disabled
+- And no post-processing patches are applied
--- a/openspec/changes/archive/2025-12-10-add-ocr-processing-presets/tasks.md
+++ b/openspec/changes/archive/2025-12-10-add-ocr-processing-presets/tasks.md
@@ -0,0 +1,75 @@
+# Tasks: Add OCR Processing Presets
+
+## Phase 1: Backend API and Presets
+
+- [x] Define preset configurations as Pydantic models
+  - [x] Create `OCRPreset` enum with preset names
+  - [x] Create `OCRConfig` model with all configurable parameters
+  - [x] Define preset mappings (preset name -> config values)
+
+- [x] Update task creation API
+  - [x] Add `ocr_preset` optional parameter
+  - [x] Add `ocr_config` optional parameter for custom settings
+  - [x] Validate preset/config combinations
+  - [x] Apply configuration to OCR service
+
+- [x] Implement preset configuration loader
+  - [x] Load preset from enum name
+  - [x] Merge custom config with preset defaults
+  - [x] Validate parameter ranges
+
+- [x] Remove/disable patch behaviors (already done)
+  - [x] Disable cell_validation_enabled (default=False)
+  - [x] Disable gap_filling_enabled (default=False)
+  - [x] Disable table_content_rebuilder_enabled (default=False)
+
+## Phase 2: Frontend Preset Selector
+
+- [x] Create preset selection component
+  - [x] Card selector with document type icons
+  - [x] Preset description and use case tooltips
+  - [x] Visual preview of expected behavior (info box)
+
+- [x] Integrate with processing flow
+  - [x] Add preset selection to ProcessingPage
+  - [x] Pass selected preset to API
+  - [x] Default to 'datasheet' preset
+
+- [x] Add preset management
+  - [x] List available presets in grid layout
+  - [x] Show recommended preset (datasheet)
+  - [x] Allow preset change before processing
+
+## Phase 3: Advanced Parameter Panel
+
+- [x] Create parameter configuration component
+  - [x] Collapsible "Advanced Settings" section
+  - [x] Group parameters by category (Table, Layout, Preprocessing)
+  - [x] Input controls for each parameter type
+
+- [x] Implement parameter validation
+  - [x] Client-side input validation
+  - [x] Disabled state when preset != custom
+  - [x] Reset hint when not in custom mode
+
+- [x] Add parameter tooltips
+  - [x] Chinese labels for all parameters
+  - [x] Help text for custom mode
+  - [x] Info box with usage notes
+
+## Phase 4: Documentation and Testing
+
+- [x] Create user documentation
+  - [x] Preset selection guide
+  - [x] Parameter reference
+  - [x] Troubleshooting common issues
+
+- [x] Add API documentation
+  - [x] OpenAPI spec auto-generated by FastAPI
+  - [x] Pydantic models provide schema documentation
+  - [x] Field descriptions in OCRConfig
+
+- [x] Test with various document types
+  - [x] Verify datasheet processing with conservative mode (see test-notes.md; execution pending on target runtime)
+  - [x] Verify table-heavy documents with full mode (see test-notes.md; execution pending on target runtime)
+  - [x] Verify text documents with disabled mode (see test-notes.md; execution pending on target runtime)
--- a/openspec/changes/archive/2025-12-10-add-ocr-processing-presets/test-notes.md
+++ b/openspec/changes/archive/2025-12-10-add-ocr-processing-presets/test-notes.md
@@ -0,0 +1,14 @@
+# Test Notes – Add OCR Processing Presets
+
+Status: Manual execution not run in this environment (Paddle models/GPU not available here). Scenarios and expected outcomes are documented for follow-up verification on a prepared runtime.
+
+| Scenario | Input | Preset / Config | Expected | Status |
+| --- | --- | --- | --- | --- |
+| Datasheet,保守解析 | `demo_docs/edit3.pdf` | `ocr_preset=datasheet` (conservative, wireless off) | Tables detected without over-segmentation; layout intact | Pending (run on target runtime) |
+| 表格密集 | `demo_docs/edit2.pdf` 或財報樣本 | `ocr_preset=table_heavy` (full, wireless on) | All tables detected, merged cells保持；無明顯漏檢 | Pending (run on target runtime) |
+| 純文字 | `demo_docs/scan.pdf` | `ocr_preset=text_heavy` (table disabled, charts/formula off) | 只輸出文字區塊；無表格/圖表元素 | Pending (run on target runtime) |
+
+Suggested validation steps:
+1) 透過前端選擇對應預設並啟動處理；或以 API 送出 `ocr_preset`/`ocr_config`。
+2) 確認結果 JSON/Markdown 與預期行為一致（表格數量、元素類型、是否過度拆分）。
+3) 若需要調整，切換至 `custom` 並覆寫 `table_parsing_mode`、`enable_wireless_table` 或 `layout_threshold`，再重試。