proposal: add hybrid control mode with auto-detection and preview

Updates add-layout-preprocessing proposal: - Auto mode: analyze image quality, auto-select parameters - Manual mode: user override with specific settings - Preview API: compare original vs preprocessed before processing - Frontend UI: mode selection, manual controls, preview button 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-27 14:31:09 +08:00
parent c12ea0b9f6
commit 06a5973f2e
4 changed files with 244 additions and 20 deletions
--- a/openspec/changes/add-layout-preprocessing/design.md
+++ b/openspec/changes/add-layout-preprocessing/design.md
@@ -27,13 +27,15 @@ Original Image ← ← ← ← Image extraction crops from original (NOT preproc
 ### Goals
 - Improve table detection for documents with faint lines
 - Preserve original image quality for element extraction
- Make preprocessing configurable (enable/disable, intensity)
+- **Hybrid control**: Auto mode by default, manual override available
+- **Preview capability**: Users can verify preprocessing before processing
 - Minimal performance impact

 ### Non-Goals
 - Preprocessing for text recognition (Raw OCR handles this separately)
 - Modifying how PP-Structure internally processes images
 - General image quality improvement (out of scope)
+- Real-time preview during processing (preview is pre-processing only)

 ## Decisions

@@ -65,6 +67,72 @@ Original Image ← ← ← ← Image extraction crops from original (NOT preproc
 - Helps make faint table borders more detectable
 - Configurable strength

+### Decision 5: Hybrid Control Mode (Auto + Manual)
+**Rationale**:
+- Auto mode provides seamless experience for most users
+- Manual mode gives power users fine control
+- Preview allows verification before committing to processing
+
+**Auto-detection algorithm**:
+```python
+def analyze_image_quality(image: np.ndarray) -> dict:
+    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
+
+    # Contrast: standard deviation of pixel values
+    contrast = np.std(gray)
+
+    # Edge strength: mean of Sobel gradient magnitude
+    sobel_x = cv2.Sobel(gray, cv2.CV_64F, 1, 0, ksize=3)
+    sobel_y = cv2.Sobel(gray, cv2.CV_64F, 0, 1, ksize=3)
+    edge_strength = np.mean(np.sqrt(sobel_x**2 + sobel_y**2))
+
+    return {
+        "contrast": contrast,
+        "edge_strength": edge_strength,
+        "recommended": {
+            "contrast": "clahe" if contrast < 40 else "none",
+            "sharpen": edge_strength < 15,
+            "binarize": contrast < 20
+        }
+    }
+```
+
+### Decision 6: Preview API Design
+**Rationale**:
+- Users should see preprocessing effect before full processing
+- Reduces trial-and-error cycles
+- Builds user confidence in the system
+
+**API Design**:
+```
+POST /api/v2/tasks/{task_id}/preview/preprocessing
+Request:
+{
+  "page": 1,
+  "mode": "auto",  // or "manual"
+  "config": {      // only for manual mode
+    "contrast": "clahe",
+    "sharpen": true,
+    "binarize": false
+  }
+}
+
+Response:
+{
+  "original_url": "/api/v2/tasks/{id}/pages/1/image",
+  "preprocessed_url": "/api/v2/tasks/{id}/pages/1/image?preprocessed=true",
+  "quality_metrics": {
+    "contrast": 35.2,
+    "edge_strength": 12.8
+  },
+  "auto_config": {
+    "contrast": "clahe",
+    "sharpen": true,
+    "binarize": false
+  }
+}
+```
+
 ## Implementation Details

 ### Preprocessing Pipeline
--- a/openspec/changes/add-layout-preprocessing/proposal.md
+++ b/openspec/changes/add-layout-preprocessing/proposal.md
@@ -18,9 +18,19 @@ The root cause is that layout detection happens **before** table structure recog
  - Image element extraction continues to use original (preserves quality)
  - Raw OCR continues to use original image

- **Configurable preprocessing options**
-  - Enable/disable preprocessing per track
-  - Adjustable preprocessing intensity
+- **Hybrid control mode** (Auto + Manual)
+  - **Auto mode (default)**: Analyze image quality and auto-select parameters
+    - Calculate contrast level (standard deviation)
+    - Detect edge clarity for faint lines
+    - Apply appropriate preprocessing based on analysis
+  - **Manual mode**: User can override with specific settings
+    - Contrast: none / histogram / clahe
+    - Sharpen: on/off
+    - Binarize: on/off
+
+- **Frontend preview API**
+  - Preview endpoint to show original vs preprocessed comparison
+  - Users can verify settings before processing

 ## Impact

@@ -31,6 +41,8 @@ The root cause is that layout detection happens **before** table structure recog
 - `backend/app/services/ocr_service.py` - Add preprocessing before PP-Structure
 - `backend/app/core/config.py` - New preprocessing configuration options
 - `backend/app/services/preprocessing_service.py` - New service (to be created)
+- `backend/app/api/v2/endpoints/preview.py` - New preview API endpoint
+- `frontend/src/components/PreprocessingSettings.tsx` - New UI component

 ### Track Impact Analysis

--- a/openspec/changes/add-layout-preprocessing/specs/ocr-processing/spec.md
+++ b/openspec/changes/add-layout-preprocessing/specs/ocr-processing/spec.md
@@ -17,12 +17,6 @@ The system SHALL provide optional image preprocessing to enhance layout detectio
 - **THEN** the system SHALL crop from the ORIGINAL image, not the preprocessed version
 - **AND** the extracted image SHALL maintain original quality and colors

-#### Scenario: Preprocessing can be disabled
- **GIVEN** `layout_preprocessing_enabled` is set to false in configuration
- **WHEN** OCR track processing runs
- **THEN** the system SHALL skip preprocessing
- **AND** PP-Structure SHALL receive the original image directly
-
 #### Scenario: CLAHE contrast enhancement
 - **WHEN** `layout_preprocessing_contrast` is set to "clahe"
 - **THEN** the system SHALL apply Contrast Limited Adaptive Histogram Equalization
@@ -38,6 +32,61 @@ The system SHALL provide optional image preprocessing to enhance layout detectio
 - **THEN** the system SHALL apply adaptive thresholding
 - **AND** this SHALL be used only for documents with very poor contrast

+### Requirement: Preprocessing Hybrid Control Mode
+
+The system SHALL support three preprocessing modes: automatic, manual, and disabled, with automatic as the default.
+
+#### Scenario: Auto mode analyzes image quality
+- **GIVEN** preprocessing mode is set to "auto"
+- **WHEN** processing begins for a page
+- **THEN** the system SHALL analyze image quality metrics (contrast, edge strength)
+- **AND** automatically determine optimal preprocessing parameters
+- **AND** apply recommended settings without user intervention
+
+#### Scenario: Auto mode detects low contrast
+- **GIVEN** preprocessing mode is "auto"
+- **WHEN** image contrast (standard deviation) is below 40
+- **THEN** the system SHALL automatically enable CLAHE contrast enhancement
+
+#### Scenario: Auto mode detects faint edges
+- **GIVEN** preprocessing mode is "auto"
+- **WHEN** image edge strength (Sobel gradient mean) is below 15
+- **THEN** the system SHALL automatically enable sharpening
+
+#### Scenario: Manual mode uses user-specified settings
+- **GIVEN** preprocessing mode is set to "manual"
+- **WHEN** processing begins
+- **THEN** the system SHALL use the user-provided preprocessing configuration
+- **AND** ignore automatic quality analysis
+
+#### Scenario: Disabled mode skips preprocessing
+- **GIVEN** preprocessing mode is set to "disabled"
+- **WHEN** processing begins
+- **THEN** the system SHALL skip all preprocessing
+- **AND** PP-Structure SHALL receive the original image directly
+
+### Requirement: Preprocessing Preview API
+
+The system SHALL provide a preview endpoint that allows users to compare original and preprocessed images before processing.
+
+#### Scenario: Preview returns comparison images
+- **GIVEN** a task with uploaded document
+- **WHEN** user requests preprocessing preview for a specific page
+- **THEN** the system SHALL return URLs or data for both original and preprocessed images
+- **AND** user can visually compare the difference
+
+#### Scenario: Preview shows auto-detected settings
+- **GIVEN** preview is requested with mode "auto"
+- **WHEN** the system analyzes the page
+- **THEN** the response SHALL include the auto-detected preprocessing configuration
+- **AND** include quality metrics (contrast, edge_strength)
+
+#### Scenario: Preview accepts manual configuration
+- **GIVEN** preview is requested with mode "manual"
+- **WHEN** user provides specific preprocessing settings
+- **THEN** the system SHALL apply those settings to generate preview
+- **AND** return the preprocessed result for user verification
+
 ### Requirement: Preprocessing Track Isolation

 The layout preprocessing feature SHALL only affect layout detection input without impacting other processing components.
@@ -53,3 +102,27 @@ The layout preprocessing feature SHALL only affect layout detection input withou
 - **WHEN** layout detection completes
 - **THEN** the preprocessed image SHALL NOT be persisted to storage
 - **AND** only the original image and element crops SHALL be saved
+
+### Requirement: Preprocessing Frontend UI
+
+The frontend SHALL provide a user interface for configuring and previewing preprocessing settings.
+
+#### Scenario: Mode selection is available
+- **GIVEN** the user is configuring OCR track processing
+- **WHEN** the preprocessing settings panel is displayed
+- **THEN** the user SHALL be able to select mode: Auto (default), Manual, or Disabled
+- **AND** Auto mode SHALL be pre-selected
+
+#### Scenario: Manual mode shows configuration options
+- **GIVEN** the user selects Manual mode
+- **WHEN** the settings panel updates
+- **THEN** the user SHALL see options for:
+  - Contrast enhancement (None / Histogram / CLAHE)
+  - Sharpen toggle
+  - Binarize toggle
+
+#### Scenario: Preview button triggers comparison view
+- **GIVEN** preprocessing settings are configured
+- **WHEN** the user clicks Preview button
+- **THEN** the system SHALL display side-by-side comparison of original and preprocessed images
+- **AND** show detected quality metrics
--- a/openspec/changes/add-layout-preprocessing/tasks.md
+++ b/openspec/changes/add-layout-preprocessing/tasks.md
@@ -3,11 +3,15 @@
 ## 1. Configuration

 - [ ] 1.1 Add preprocessing configuration to `backend/app/core/config.py`
-  - `layout_preprocessing_enabled: bool = True` - Enable/disable preprocessing
+  - `layout_preprocessing_mode: str = "auto"` - Options: auto, manual, disabled
  - `layout_preprocessing_contrast: str = "clahe"` - Options: none, histogram, clahe
  - `layout_preprocessing_sharpen: bool = True` - Enable sharpening for faint lines
  - `layout_preprocessing_binarize: bool = False` - Optional binarization (aggressive)

+- [ ] 1.2 Add preprocessing schema to `backend/app/schemas/task.py`
+  - `PreprocessingMode` enum: auto, manual, disabled
+  - `PreprocessingConfig` schema for API request/response
+
 ## 2. Preprocessing Service

 - [ ] 2.1 Create `backend/app/services/preprocessing_service.py`
@@ -18,15 +22,26 @@
  - Return preprocessed image as numpy array or PIL Image

 - [ ] 2.2 Implement `enhance_for_layout_detection()` function
-  - Input: Original image path or PIL Image
+  - Input: Original image path or PIL Image + config
  - Output: Preprocessed image (same format as input)
  - Steps: contrast → sharpen → (optional) binarize

+- [ ] 2.3 Implement `analyze_image_quality()` function (Auto mode)
+  - Calculate contrast level (standard deviation of grayscale)
+  - Detect edge clarity (Sobel/Canny edge strength)
+  - Return recommended `PreprocessingConfig` based on analysis
+  - Thresholds:
+    - Low contrast < 40: Apply CLAHE
+    - Faint edges < 0.1: Apply sharpen
+    - Very low contrast < 20: Consider binarize
+
 ## 3. Integration with OCR Service

 - [ ] 3.1 Update `backend/app/services/ocr_service.py`
  - Import preprocessing service
-  - Before `_run_ppstructure()`, preprocess image if enabled
+  - Check preprocessing mode (auto/manual/disabled)
+  - If auto: call `analyze_image_quality()` first
+  - Before `_run_ppstructure()`, preprocess image based on config
  - Pass preprocessed image to PP-Structure for layout detection
  - Keep original image reference for image extraction

@@ -34,21 +49,77 @@
  - Verify `saved_path` and `img_path` in elements reference original
  - Bbox coordinates from preprocessed detection applied to original crop

-## 4. Testing
+- [ ] 3.3 Update task start API to accept preprocessing options
+  - Add `preprocessing_mode` parameter to start request
+  - Add `preprocessing_config` for manual mode overrides

- [ ] 4.1 Unit tests for preprocessing_service
+## 4. Preview API
+
+- [ ] 4.1 Create `backend/app/api/v2/endpoints/preview.py`
+  - `POST /api/v2/tasks/{task_id}/preview/preprocessing`
+  - Input: page number, preprocessing config (optional)
+  - Output:
+    - Original image (base64 or URL)
+    - Preprocessed image (base64 or URL)
+    - Auto-detected config (if mode=auto)
+    - Image quality metrics (contrast, edge_strength)
+
+- [ ] 4.2 Add preview router to API
+  - Register in `backend/app/api/v2/api.py`
+  - Add appropriate authentication/authorization
+
+## 5. Frontend UI
+
+- [ ] 5.1 Create `frontend/src/components/PreprocessingSettings.tsx`
+  - Radio buttons: Auto / Manual / Disabled
+  - Manual mode shows:
+    - Contrast dropdown: None / Histogram / CLAHE
+    - Sharpen checkbox
+    - Binarize checkbox
+  - Preview button to trigger comparison view
+
+- [ ] 5.2 Create `frontend/src/components/PreprocessingPreview.tsx`
+  - Side-by-side image comparison (original vs preprocessed)
+  - Display detected quality metrics
+  - Show which auto settings would be applied
+  - Slider or toggle to switch between views
+
+- [ ] 5.3 Integrate with task start flow
+  - Add PreprocessingSettings to OCR track options
+  - Pass selected config to task start API
+  - Store user preference in localStorage
+
+- [ ] 5.4 Add i18n translations
+  - `frontend/src/i18n/locales/zh-TW.json` - Traditional Chinese
+  - `frontend/src/i18n/locales/en.json` - English (if exists)
+
+## 6. Testing
+
+- [ ] 6.1 Unit tests for preprocessing_service
  - Test contrast enhancement methods
  - Test sharpening filter
  - Test binarization
+  - Test `analyze_image_quality()` with various images
  - Test with various image formats (PNG, JPEG)

- [ ] 4.2 Integration tests
-  - Test OCR track with preprocessing enabled/disabled
+- [ ] 6.2 Unit tests for preview API
+  - Test preview endpoint returns correct images
+  - Test auto-detection returns sensible config
+
+- [ ] 6.3 Integration tests
+  - Test OCR track with preprocessing modes (auto/manual/disabled)
  - Verify image element quality is preserved
  - Test with known problematic documents (faint table borders)
+  - Verify auto mode improves detection for low-quality images

-## 5. Documentation
+## 7. Documentation

- [ ] 5.1 Update API documentation
+- [ ] 7.1 Update API documentation
  - Document new configuration options
-  - Explain preprocessing behavior
+  - Document preview endpoint
+  - Explain preprocessing behavior and modes
+
+- [ ] 7.2 Add user guide section
+  - When to use auto vs manual
+  - How to interpret quality metrics
+  - Troubleshooting tips