From 06a5973f2e0ec10032d52012e0a97668a2626c9b Mon Sep 17 00:00:00 2001 From: egg Date: Thu, 27 Nov 2025 14:31:09 +0800 Subject: [PATCH] proposal: add hybrid control mode with auto-detection and preview MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Updates add-layout-preprocessing proposal: - Auto mode: analyze image quality, auto-select parameters - Manual mode: user override with specific settings - Preview API: compare original vs preprocessed before processing - Frontend UI: mode selection, manual controls, preview button 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude --- .../add-layout-preprocessing/design.md | 70 +++++++++++++- .../add-layout-preprocessing/proposal.md | 18 +++- .../specs/ocr-processing/spec.md | 85 +++++++++++++++-- .../changes/add-layout-preprocessing/tasks.md | 91 +++++++++++++++++-- 4 files changed, 244 insertions(+), 20 deletions(-) diff --git a/openspec/changes/add-layout-preprocessing/design.md b/openspec/changes/add-layout-preprocessing/design.md index 8f8b262..979b62c 100644 --- a/openspec/changes/add-layout-preprocessing/design.md +++ b/openspec/changes/add-layout-preprocessing/design.md @@ -27,13 +27,15 @@ Original Image ← ← ← ← Image extraction crops from original (NOT preproc ### Goals - Improve table detection for documents with faint lines - Preserve original image quality for element extraction -- Make preprocessing configurable (enable/disable, intensity) +- **Hybrid control**: Auto mode by default, manual override available +- **Preview capability**: Users can verify preprocessing before processing - Minimal performance impact ### Non-Goals - Preprocessing for text recognition (Raw OCR handles this separately) - Modifying how PP-Structure internally processes images - General image quality improvement (out of scope) +- Real-time preview during processing (preview is pre-processing only) ## Decisions @@ -65,6 +67,72 @@ Original Image ← ← ← ← Image extraction crops from original (NOT preproc - Helps make faint table borders more detectable - Configurable strength +### Decision 5: Hybrid Control Mode (Auto + Manual) +**Rationale**: +- Auto mode provides seamless experience for most users +- Manual mode gives power users fine control +- Preview allows verification before committing to processing + +**Auto-detection algorithm**: +```python +def analyze_image_quality(image: np.ndarray) -> dict: + gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) + + # Contrast: standard deviation of pixel values + contrast = np.std(gray) + + # Edge strength: mean of Sobel gradient magnitude + sobel_x = cv2.Sobel(gray, cv2.CV_64F, 1, 0, ksize=3) + sobel_y = cv2.Sobel(gray, cv2.CV_64F, 0, 1, ksize=3) + edge_strength = np.mean(np.sqrt(sobel_x**2 + sobel_y**2)) + + return { + "contrast": contrast, + "edge_strength": edge_strength, + "recommended": { + "contrast": "clahe" if contrast < 40 else "none", + "sharpen": edge_strength < 15, + "binarize": contrast < 20 + } + } +``` + +### Decision 6: Preview API Design +**Rationale**: +- Users should see preprocessing effect before full processing +- Reduces trial-and-error cycles +- Builds user confidence in the system + +**API Design**: +``` +POST /api/v2/tasks/{task_id}/preview/preprocessing +Request: +{ + "page": 1, + "mode": "auto", // or "manual" + "config": { // only for manual mode + "contrast": "clahe", + "sharpen": true, + "binarize": false + } +} + +Response: +{ + "original_url": "/api/v2/tasks/{id}/pages/1/image", + "preprocessed_url": "/api/v2/tasks/{id}/pages/1/image?preprocessed=true", + "quality_metrics": { + "contrast": 35.2, + "edge_strength": 12.8 + }, + "auto_config": { + "contrast": "clahe", + "sharpen": true, + "binarize": false + } +} +``` + ## Implementation Details ### Preprocessing Pipeline diff --git a/openspec/changes/add-layout-preprocessing/proposal.md b/openspec/changes/add-layout-preprocessing/proposal.md index ea773eb..b0e459a 100644 --- a/openspec/changes/add-layout-preprocessing/proposal.md +++ b/openspec/changes/add-layout-preprocessing/proposal.md @@ -18,9 +18,19 @@ The root cause is that layout detection happens **before** table structure recog - Image element extraction continues to use original (preserves quality) - Raw OCR continues to use original image -- **Configurable preprocessing options** - - Enable/disable preprocessing per track - - Adjustable preprocessing intensity +- **Hybrid control mode** (Auto + Manual) + - **Auto mode (default)**: Analyze image quality and auto-select parameters + - Calculate contrast level (standard deviation) + - Detect edge clarity for faint lines + - Apply appropriate preprocessing based on analysis + - **Manual mode**: User can override with specific settings + - Contrast: none / histogram / clahe + - Sharpen: on/off + - Binarize: on/off + +- **Frontend preview API** + - Preview endpoint to show original vs preprocessed comparison + - Users can verify settings before processing ## Impact @@ -31,6 +41,8 @@ The root cause is that layout detection happens **before** table structure recog - `backend/app/services/ocr_service.py` - Add preprocessing before PP-Structure - `backend/app/core/config.py` - New preprocessing configuration options - `backend/app/services/preprocessing_service.py` - New service (to be created) +- `backend/app/api/v2/endpoints/preview.py` - New preview API endpoint +- `frontend/src/components/PreprocessingSettings.tsx` - New UI component ### Track Impact Analysis diff --git a/openspec/changes/add-layout-preprocessing/specs/ocr-processing/spec.md b/openspec/changes/add-layout-preprocessing/specs/ocr-processing/spec.md index 3109993..143bf26 100644 --- a/openspec/changes/add-layout-preprocessing/specs/ocr-processing/spec.md +++ b/openspec/changes/add-layout-preprocessing/specs/ocr-processing/spec.md @@ -17,12 +17,6 @@ The system SHALL provide optional image preprocessing to enhance layout detectio - **THEN** the system SHALL crop from the ORIGINAL image, not the preprocessed version - **AND** the extracted image SHALL maintain original quality and colors -#### Scenario: Preprocessing can be disabled -- **GIVEN** `layout_preprocessing_enabled` is set to false in configuration -- **WHEN** OCR track processing runs -- **THEN** the system SHALL skip preprocessing -- **AND** PP-Structure SHALL receive the original image directly - #### Scenario: CLAHE contrast enhancement - **WHEN** `layout_preprocessing_contrast` is set to "clahe" - **THEN** the system SHALL apply Contrast Limited Adaptive Histogram Equalization @@ -38,6 +32,61 @@ The system SHALL provide optional image preprocessing to enhance layout detectio - **THEN** the system SHALL apply adaptive thresholding - **AND** this SHALL be used only for documents with very poor contrast +### Requirement: Preprocessing Hybrid Control Mode + +The system SHALL support three preprocessing modes: automatic, manual, and disabled, with automatic as the default. + +#### Scenario: Auto mode analyzes image quality +- **GIVEN** preprocessing mode is set to "auto" +- **WHEN** processing begins for a page +- **THEN** the system SHALL analyze image quality metrics (contrast, edge strength) +- **AND** automatically determine optimal preprocessing parameters +- **AND** apply recommended settings without user intervention + +#### Scenario: Auto mode detects low contrast +- **GIVEN** preprocessing mode is "auto" +- **WHEN** image contrast (standard deviation) is below 40 +- **THEN** the system SHALL automatically enable CLAHE contrast enhancement + +#### Scenario: Auto mode detects faint edges +- **GIVEN** preprocessing mode is "auto" +- **WHEN** image edge strength (Sobel gradient mean) is below 15 +- **THEN** the system SHALL automatically enable sharpening + +#### Scenario: Manual mode uses user-specified settings +- **GIVEN** preprocessing mode is set to "manual" +- **WHEN** processing begins +- **THEN** the system SHALL use the user-provided preprocessing configuration +- **AND** ignore automatic quality analysis + +#### Scenario: Disabled mode skips preprocessing +- **GIVEN** preprocessing mode is set to "disabled" +- **WHEN** processing begins +- **THEN** the system SHALL skip all preprocessing +- **AND** PP-Structure SHALL receive the original image directly + +### Requirement: Preprocessing Preview API + +The system SHALL provide a preview endpoint that allows users to compare original and preprocessed images before processing. + +#### Scenario: Preview returns comparison images +- **GIVEN** a task with uploaded document +- **WHEN** user requests preprocessing preview for a specific page +- **THEN** the system SHALL return URLs or data for both original and preprocessed images +- **AND** user can visually compare the difference + +#### Scenario: Preview shows auto-detected settings +- **GIVEN** preview is requested with mode "auto" +- **WHEN** the system analyzes the page +- **THEN** the response SHALL include the auto-detected preprocessing configuration +- **AND** include quality metrics (contrast, edge_strength) + +#### Scenario: Preview accepts manual configuration +- **GIVEN** preview is requested with mode "manual" +- **WHEN** user provides specific preprocessing settings +- **THEN** the system SHALL apply those settings to generate preview +- **AND** return the preprocessed result for user verification + ### Requirement: Preprocessing Track Isolation The layout preprocessing feature SHALL only affect layout detection input without impacting other processing components. @@ -53,3 +102,27 @@ The layout preprocessing feature SHALL only affect layout detection input withou - **WHEN** layout detection completes - **THEN** the preprocessed image SHALL NOT be persisted to storage - **AND** only the original image and element crops SHALL be saved + +### Requirement: Preprocessing Frontend UI + +The frontend SHALL provide a user interface for configuring and previewing preprocessing settings. + +#### Scenario: Mode selection is available +- **GIVEN** the user is configuring OCR track processing +- **WHEN** the preprocessing settings panel is displayed +- **THEN** the user SHALL be able to select mode: Auto (default), Manual, or Disabled +- **AND** Auto mode SHALL be pre-selected + +#### Scenario: Manual mode shows configuration options +- **GIVEN** the user selects Manual mode +- **WHEN** the settings panel updates +- **THEN** the user SHALL see options for: + - Contrast enhancement (None / Histogram / CLAHE) + - Sharpen toggle + - Binarize toggle + +#### Scenario: Preview button triggers comparison view +- **GIVEN** preprocessing settings are configured +- **WHEN** the user clicks Preview button +- **THEN** the system SHALL display side-by-side comparison of original and preprocessed images +- **AND** show detected quality metrics diff --git a/openspec/changes/add-layout-preprocessing/tasks.md b/openspec/changes/add-layout-preprocessing/tasks.md index 126f153..5ba7a63 100644 --- a/openspec/changes/add-layout-preprocessing/tasks.md +++ b/openspec/changes/add-layout-preprocessing/tasks.md @@ -3,11 +3,15 @@ ## 1. Configuration - [ ] 1.1 Add preprocessing configuration to `backend/app/core/config.py` - - `layout_preprocessing_enabled: bool = True` - Enable/disable preprocessing + - `layout_preprocessing_mode: str = "auto"` - Options: auto, manual, disabled - `layout_preprocessing_contrast: str = "clahe"` - Options: none, histogram, clahe - `layout_preprocessing_sharpen: bool = True` - Enable sharpening for faint lines - `layout_preprocessing_binarize: bool = False` - Optional binarization (aggressive) +- [ ] 1.2 Add preprocessing schema to `backend/app/schemas/task.py` + - `PreprocessingMode` enum: auto, manual, disabled + - `PreprocessingConfig` schema for API request/response + ## 2. Preprocessing Service - [ ] 2.1 Create `backend/app/services/preprocessing_service.py` @@ -18,15 +22,26 @@ - Return preprocessed image as numpy array or PIL Image - [ ] 2.2 Implement `enhance_for_layout_detection()` function - - Input: Original image path or PIL Image + - Input: Original image path or PIL Image + config - Output: Preprocessed image (same format as input) - Steps: contrast → sharpen → (optional) binarize +- [ ] 2.3 Implement `analyze_image_quality()` function (Auto mode) + - Calculate contrast level (standard deviation of grayscale) + - Detect edge clarity (Sobel/Canny edge strength) + - Return recommended `PreprocessingConfig` based on analysis + - Thresholds: + - Low contrast < 40: Apply CLAHE + - Faint edges < 0.1: Apply sharpen + - Very low contrast < 20: Consider binarize + ## 3. Integration with OCR Service - [ ] 3.1 Update `backend/app/services/ocr_service.py` - Import preprocessing service - - Before `_run_ppstructure()`, preprocess image if enabled + - Check preprocessing mode (auto/manual/disabled) + - If auto: call `analyze_image_quality()` first + - Before `_run_ppstructure()`, preprocess image based on config - Pass preprocessed image to PP-Structure for layout detection - Keep original image reference for image extraction @@ -34,21 +49,77 @@ - Verify `saved_path` and `img_path` in elements reference original - Bbox coordinates from preprocessed detection applied to original crop -## 4. Testing +- [ ] 3.3 Update task start API to accept preprocessing options + - Add `preprocessing_mode` parameter to start request + - Add `preprocessing_config` for manual mode overrides -- [ ] 4.1 Unit tests for preprocessing_service +## 4. Preview API + +- [ ] 4.1 Create `backend/app/api/v2/endpoints/preview.py` + - `POST /api/v2/tasks/{task_id}/preview/preprocessing` + - Input: page number, preprocessing config (optional) + - Output: + - Original image (base64 or URL) + - Preprocessed image (base64 or URL) + - Auto-detected config (if mode=auto) + - Image quality metrics (contrast, edge_strength) + +- [ ] 4.2 Add preview router to API + - Register in `backend/app/api/v2/api.py` + - Add appropriate authentication/authorization + +## 5. Frontend UI + +- [ ] 5.1 Create `frontend/src/components/PreprocessingSettings.tsx` + - Radio buttons: Auto / Manual / Disabled + - Manual mode shows: + - Contrast dropdown: None / Histogram / CLAHE + - Sharpen checkbox + - Binarize checkbox + - Preview button to trigger comparison view + +- [ ] 5.2 Create `frontend/src/components/PreprocessingPreview.tsx` + - Side-by-side image comparison (original vs preprocessed) + - Display detected quality metrics + - Show which auto settings would be applied + - Slider or toggle to switch between views + +- [ ] 5.3 Integrate with task start flow + - Add PreprocessingSettings to OCR track options + - Pass selected config to task start API + - Store user preference in localStorage + +- [ ] 5.4 Add i18n translations + - `frontend/src/i18n/locales/zh-TW.json` - Traditional Chinese + - `frontend/src/i18n/locales/en.json` - English (if exists) + +## 6. Testing + +- [ ] 6.1 Unit tests for preprocessing_service - Test contrast enhancement methods - Test sharpening filter - Test binarization + - Test `analyze_image_quality()` with various images - Test with various image formats (PNG, JPEG) -- [ ] 4.2 Integration tests - - Test OCR track with preprocessing enabled/disabled +- [ ] 6.2 Unit tests for preview API + - Test preview endpoint returns correct images + - Test auto-detection returns sensible config + +- [ ] 6.3 Integration tests + - Test OCR track with preprocessing modes (auto/manual/disabled) - Verify image element quality is preserved - Test with known problematic documents (faint table borders) + - Verify auto mode improves detection for low-quality images -## 5. Documentation +## 7. Documentation -- [ ] 5.1 Update API documentation +- [ ] 7.1 Update API documentation - Document new configuration options - - Explain preprocessing behavior + - Document preview endpoint + - Explain preprocessing behavior and modes + +- [ ] 7.2 Add user guide section + - When to use auto vs manual + - How to interpret quality metrics + - Troubleshooting tips