proposal: add hybrid control mode with auto-detection and preview

Updates add-layout-preprocessing proposal: - Auto mode: analyze image quality, auto-select parameters - Manual mode: user override with specific settings - Preview API: compare original vs preprocessed before processing - Frontend UI: mode selection, manual controls, preview button 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-27 14:31:09 +08:00
parent c12ea0b9f6
commit 06a5973f2e
4 changed files with 244 additions and 20 deletions
--- a/openspec/changes/add-layout-preprocessing/tasks.md
+++ b/openspec/changes/add-layout-preprocessing/tasks.md
@@ -3,11 +3,15 @@
 ## 1. Configuration

 - [ ] 1.1 Add preprocessing configuration to `backend/app/core/config.py`
-  - `layout_preprocessing_enabled: bool = True` - Enable/disable preprocessing
+  - `layout_preprocessing_mode: str = "auto"` - Options: auto, manual, disabled
  - `layout_preprocessing_contrast: str = "clahe"` - Options: none, histogram, clahe
  - `layout_preprocessing_sharpen: bool = True` - Enable sharpening for faint lines
  - `layout_preprocessing_binarize: bool = False` - Optional binarization (aggressive)

+- [ ] 1.2 Add preprocessing schema to `backend/app/schemas/task.py`
+  - `PreprocessingMode` enum: auto, manual, disabled
+  - `PreprocessingConfig` schema for API request/response
+
 ## 2. Preprocessing Service

 - [ ] 2.1 Create `backend/app/services/preprocessing_service.py`
@@ -18,15 +22,26 @@
  - Return preprocessed image as numpy array or PIL Image

 - [ ] 2.2 Implement `enhance_for_layout_detection()` function
-  - Input: Original image path or PIL Image
+  - Input: Original image path or PIL Image + config
  - Output: Preprocessed image (same format as input)
  - Steps: contrast → sharpen → (optional) binarize

+- [ ] 2.3 Implement `analyze_image_quality()` function (Auto mode)
+  - Calculate contrast level (standard deviation of grayscale)
+  - Detect edge clarity (Sobel/Canny edge strength)
+  - Return recommended `PreprocessingConfig` based on analysis
+  - Thresholds:
+    - Low contrast < 40: Apply CLAHE
+    - Faint edges < 0.1: Apply sharpen
+    - Very low contrast < 20: Consider binarize
+
 ## 3. Integration with OCR Service

 - [ ] 3.1 Update `backend/app/services/ocr_service.py`
  - Import preprocessing service
-  - Before `_run_ppstructure()`, preprocess image if enabled
+  - Check preprocessing mode (auto/manual/disabled)
+  - If auto: call `analyze_image_quality()` first
+  - Before `_run_ppstructure()`, preprocess image based on config
  - Pass preprocessed image to PP-Structure for layout detection
  - Keep original image reference for image extraction

@@ -34,21 +49,77 @@
  - Verify `saved_path` and `img_path` in elements reference original
  - Bbox coordinates from preprocessed detection applied to original crop

-## 4. Testing
+- [ ] 3.3 Update task start API to accept preprocessing options
+  - Add `preprocessing_mode` parameter to start request
+  - Add `preprocessing_config` for manual mode overrides

- [ ] 4.1 Unit tests for preprocessing_service
+## 4. Preview API
+
+- [ ] 4.1 Create `backend/app/api/v2/endpoints/preview.py`
+  - `POST /api/v2/tasks/{task_id}/preview/preprocessing`
+  - Input: page number, preprocessing config (optional)
+  - Output:
+    - Original image (base64 or URL)
+    - Preprocessed image (base64 or URL)
+    - Auto-detected config (if mode=auto)
+    - Image quality metrics (contrast, edge_strength)
+
+- [ ] 4.2 Add preview router to API
+  - Register in `backend/app/api/v2/api.py`
+  - Add appropriate authentication/authorization
+
+## 5. Frontend UI
+
+- [ ] 5.1 Create `frontend/src/components/PreprocessingSettings.tsx`
+  - Radio buttons: Auto / Manual / Disabled
+  - Manual mode shows:
+    - Contrast dropdown: None / Histogram / CLAHE
+    - Sharpen checkbox
+    - Binarize checkbox
+  - Preview button to trigger comparison view
+
+- [ ] 5.2 Create `frontend/src/components/PreprocessingPreview.tsx`
+  - Side-by-side image comparison (original vs preprocessed)
+  - Display detected quality metrics
+  - Show which auto settings would be applied
+  - Slider or toggle to switch between views
+
+- [ ] 5.3 Integrate with task start flow
+  - Add PreprocessingSettings to OCR track options
+  - Pass selected config to task start API
+  - Store user preference in localStorage
+
+- [ ] 5.4 Add i18n translations
+  - `frontend/src/i18n/locales/zh-TW.json` - Traditional Chinese
+  - `frontend/src/i18n/locales/en.json` - English (if exists)
+
+## 6. Testing
+
+- [ ] 6.1 Unit tests for preprocessing_service
  - Test contrast enhancement methods
  - Test sharpening filter
  - Test binarization
+  - Test `analyze_image_quality()` with various images
  - Test with various image formats (PNG, JPEG)

- [ ] 4.2 Integration tests
-  - Test OCR track with preprocessing enabled/disabled
+- [ ] 6.2 Unit tests for preview API
+  - Test preview endpoint returns correct images
+  - Test auto-detection returns sensible config
+
+- [ ] 6.3 Integration tests
+  - Test OCR track with preprocessing modes (auto/manual/disabled)
  - Verify image element quality is preserved
  - Test with known problematic documents (faint table borders)
+  - Verify auto mode improves detection for low-quality images

-## 5. Documentation
+## 7. Documentation

- [ ] 5.1 Update API documentation
+- [ ] 7.1 Update API documentation
  - Document new configuration options
-  - Explain preprocessing behavior
+  - Document preview endpoint
+  - Explain preprocessing behavior and modes
+
+- [ ] 7.2 Add user guide section
+  - When to use auto vs manual
+  - How to interpret quality metrics
+  - Troubleshooting tips