proposal: add hybrid control mode with auto-detection and preview

Updates add-layout-preprocessing proposal:
- Auto mode: analyze image quality, auto-select parameters
- Manual mode: user override with specific settings
- Preview API: compare original vs preprocessed before processing
- Frontend UI: mode selection, manual controls, preview button

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
egg
2025-11-27 14:31:09 +08:00
parent c12ea0b9f6
commit 06a5973f2e
4 changed files with 244 additions and 20 deletions

View File

@@ -27,13 +27,15 @@ Original Image ← ← ← ← Image extraction crops from original (NOT preproc
### Goals ### Goals
- Improve table detection for documents with faint lines - Improve table detection for documents with faint lines
- Preserve original image quality for element extraction - Preserve original image quality for element extraction
- Make preprocessing configurable (enable/disable, intensity) - **Hybrid control**: Auto mode by default, manual override available
- **Preview capability**: Users can verify preprocessing before processing
- Minimal performance impact - Minimal performance impact
### Non-Goals ### Non-Goals
- Preprocessing for text recognition (Raw OCR handles this separately) - Preprocessing for text recognition (Raw OCR handles this separately)
- Modifying how PP-Structure internally processes images - Modifying how PP-Structure internally processes images
- General image quality improvement (out of scope) - General image quality improvement (out of scope)
- Real-time preview during processing (preview is pre-processing only)
## Decisions ## Decisions
@@ -65,6 +67,72 @@ Original Image ← ← ← ← Image extraction crops from original (NOT preproc
- Helps make faint table borders more detectable - Helps make faint table borders more detectable
- Configurable strength - Configurable strength
### Decision 5: Hybrid Control Mode (Auto + Manual)
**Rationale**:
- Auto mode provides seamless experience for most users
- Manual mode gives power users fine control
- Preview allows verification before committing to processing
**Auto-detection algorithm**:
```python
def analyze_image_quality(image: np.ndarray) -> dict:
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Contrast: standard deviation of pixel values
contrast = np.std(gray)
# Edge strength: mean of Sobel gradient magnitude
sobel_x = cv2.Sobel(gray, cv2.CV_64F, 1, 0, ksize=3)
sobel_y = cv2.Sobel(gray, cv2.CV_64F, 0, 1, ksize=3)
edge_strength = np.mean(np.sqrt(sobel_x**2 + sobel_y**2))
return {
"contrast": contrast,
"edge_strength": edge_strength,
"recommended": {
"contrast": "clahe" if contrast < 40 else "none",
"sharpen": edge_strength < 15,
"binarize": contrast < 20
}
}
```
### Decision 6: Preview API Design
**Rationale**:
- Users should see preprocessing effect before full processing
- Reduces trial-and-error cycles
- Builds user confidence in the system
**API Design**:
```
POST /api/v2/tasks/{task_id}/preview/preprocessing
Request:
{
"page": 1,
"mode": "auto", // or "manual"
"config": { // only for manual mode
"contrast": "clahe",
"sharpen": true,
"binarize": false
}
}
Response:
{
"original_url": "/api/v2/tasks/{id}/pages/1/image",
"preprocessed_url": "/api/v2/tasks/{id}/pages/1/image?preprocessed=true",
"quality_metrics": {
"contrast": 35.2,
"edge_strength": 12.8
},
"auto_config": {
"contrast": "clahe",
"sharpen": true,
"binarize": false
}
}
```
## Implementation Details ## Implementation Details
### Preprocessing Pipeline ### Preprocessing Pipeline

View File

@@ -18,9 +18,19 @@ The root cause is that layout detection happens **before** table structure recog
- Image element extraction continues to use original (preserves quality) - Image element extraction continues to use original (preserves quality)
- Raw OCR continues to use original image - Raw OCR continues to use original image
- **Configurable preprocessing options** - **Hybrid control mode** (Auto + Manual)
- Enable/disable preprocessing per track - **Auto mode (default)**: Analyze image quality and auto-select parameters
- Adjustable preprocessing intensity - Calculate contrast level (standard deviation)
- Detect edge clarity for faint lines
- Apply appropriate preprocessing based on analysis
- **Manual mode**: User can override with specific settings
- Contrast: none / histogram / clahe
- Sharpen: on/off
- Binarize: on/off
- **Frontend preview API**
- Preview endpoint to show original vs preprocessed comparison
- Users can verify settings before processing
## Impact ## Impact
@@ -31,6 +41,8 @@ The root cause is that layout detection happens **before** table structure recog
- `backend/app/services/ocr_service.py` - Add preprocessing before PP-Structure - `backend/app/services/ocr_service.py` - Add preprocessing before PP-Structure
- `backend/app/core/config.py` - New preprocessing configuration options - `backend/app/core/config.py` - New preprocessing configuration options
- `backend/app/services/preprocessing_service.py` - New service (to be created) - `backend/app/services/preprocessing_service.py` - New service (to be created)
- `backend/app/api/v2/endpoints/preview.py` - New preview API endpoint
- `frontend/src/components/PreprocessingSettings.tsx` - New UI component
### Track Impact Analysis ### Track Impact Analysis

View File

@@ -17,12 +17,6 @@ The system SHALL provide optional image preprocessing to enhance layout detectio
- **THEN** the system SHALL crop from the ORIGINAL image, not the preprocessed version - **THEN** the system SHALL crop from the ORIGINAL image, not the preprocessed version
- **AND** the extracted image SHALL maintain original quality and colors - **AND** the extracted image SHALL maintain original quality and colors
#### Scenario: Preprocessing can be disabled
- **GIVEN** `layout_preprocessing_enabled` is set to false in configuration
- **WHEN** OCR track processing runs
- **THEN** the system SHALL skip preprocessing
- **AND** PP-Structure SHALL receive the original image directly
#### Scenario: CLAHE contrast enhancement #### Scenario: CLAHE contrast enhancement
- **WHEN** `layout_preprocessing_contrast` is set to "clahe" - **WHEN** `layout_preprocessing_contrast` is set to "clahe"
- **THEN** the system SHALL apply Contrast Limited Adaptive Histogram Equalization - **THEN** the system SHALL apply Contrast Limited Adaptive Histogram Equalization
@@ -38,6 +32,61 @@ The system SHALL provide optional image preprocessing to enhance layout detectio
- **THEN** the system SHALL apply adaptive thresholding - **THEN** the system SHALL apply adaptive thresholding
- **AND** this SHALL be used only for documents with very poor contrast - **AND** this SHALL be used only for documents with very poor contrast
### Requirement: Preprocessing Hybrid Control Mode
The system SHALL support three preprocessing modes: automatic, manual, and disabled, with automatic as the default.
#### Scenario: Auto mode analyzes image quality
- **GIVEN** preprocessing mode is set to "auto"
- **WHEN** processing begins for a page
- **THEN** the system SHALL analyze image quality metrics (contrast, edge strength)
- **AND** automatically determine optimal preprocessing parameters
- **AND** apply recommended settings without user intervention
#### Scenario: Auto mode detects low contrast
- **GIVEN** preprocessing mode is "auto"
- **WHEN** image contrast (standard deviation) is below 40
- **THEN** the system SHALL automatically enable CLAHE contrast enhancement
#### Scenario: Auto mode detects faint edges
- **GIVEN** preprocessing mode is "auto"
- **WHEN** image edge strength (Sobel gradient mean) is below 15
- **THEN** the system SHALL automatically enable sharpening
#### Scenario: Manual mode uses user-specified settings
- **GIVEN** preprocessing mode is set to "manual"
- **WHEN** processing begins
- **THEN** the system SHALL use the user-provided preprocessing configuration
- **AND** ignore automatic quality analysis
#### Scenario: Disabled mode skips preprocessing
- **GIVEN** preprocessing mode is set to "disabled"
- **WHEN** processing begins
- **THEN** the system SHALL skip all preprocessing
- **AND** PP-Structure SHALL receive the original image directly
### Requirement: Preprocessing Preview API
The system SHALL provide a preview endpoint that allows users to compare original and preprocessed images before processing.
#### Scenario: Preview returns comparison images
- **GIVEN** a task with uploaded document
- **WHEN** user requests preprocessing preview for a specific page
- **THEN** the system SHALL return URLs or data for both original and preprocessed images
- **AND** user can visually compare the difference
#### Scenario: Preview shows auto-detected settings
- **GIVEN** preview is requested with mode "auto"
- **WHEN** the system analyzes the page
- **THEN** the response SHALL include the auto-detected preprocessing configuration
- **AND** include quality metrics (contrast, edge_strength)
#### Scenario: Preview accepts manual configuration
- **GIVEN** preview is requested with mode "manual"
- **WHEN** user provides specific preprocessing settings
- **THEN** the system SHALL apply those settings to generate preview
- **AND** return the preprocessed result for user verification
### Requirement: Preprocessing Track Isolation ### Requirement: Preprocessing Track Isolation
The layout preprocessing feature SHALL only affect layout detection input without impacting other processing components. The layout preprocessing feature SHALL only affect layout detection input without impacting other processing components.
@@ -53,3 +102,27 @@ The layout preprocessing feature SHALL only affect layout detection input withou
- **WHEN** layout detection completes - **WHEN** layout detection completes
- **THEN** the preprocessed image SHALL NOT be persisted to storage - **THEN** the preprocessed image SHALL NOT be persisted to storage
- **AND** only the original image and element crops SHALL be saved - **AND** only the original image and element crops SHALL be saved
### Requirement: Preprocessing Frontend UI
The frontend SHALL provide a user interface for configuring and previewing preprocessing settings.
#### Scenario: Mode selection is available
- **GIVEN** the user is configuring OCR track processing
- **WHEN** the preprocessing settings panel is displayed
- **THEN** the user SHALL be able to select mode: Auto (default), Manual, or Disabled
- **AND** Auto mode SHALL be pre-selected
#### Scenario: Manual mode shows configuration options
- **GIVEN** the user selects Manual mode
- **WHEN** the settings panel updates
- **THEN** the user SHALL see options for:
- Contrast enhancement (None / Histogram / CLAHE)
- Sharpen toggle
- Binarize toggle
#### Scenario: Preview button triggers comparison view
- **GIVEN** preprocessing settings are configured
- **WHEN** the user clicks Preview button
- **THEN** the system SHALL display side-by-side comparison of original and preprocessed images
- **AND** show detected quality metrics

View File

@@ -3,11 +3,15 @@
## 1. Configuration ## 1. Configuration
- [ ] 1.1 Add preprocessing configuration to `backend/app/core/config.py` - [ ] 1.1 Add preprocessing configuration to `backend/app/core/config.py`
- `layout_preprocessing_enabled: bool = True` - Enable/disable preprocessing - `layout_preprocessing_mode: str = "auto"` - Options: auto, manual, disabled
- `layout_preprocessing_contrast: str = "clahe"` - Options: none, histogram, clahe - `layout_preprocessing_contrast: str = "clahe"` - Options: none, histogram, clahe
- `layout_preprocessing_sharpen: bool = True` - Enable sharpening for faint lines - `layout_preprocessing_sharpen: bool = True` - Enable sharpening for faint lines
- `layout_preprocessing_binarize: bool = False` - Optional binarization (aggressive) - `layout_preprocessing_binarize: bool = False` - Optional binarization (aggressive)
- [ ] 1.2 Add preprocessing schema to `backend/app/schemas/task.py`
- `PreprocessingMode` enum: auto, manual, disabled
- `PreprocessingConfig` schema for API request/response
## 2. Preprocessing Service ## 2. Preprocessing Service
- [ ] 2.1 Create `backend/app/services/preprocessing_service.py` - [ ] 2.1 Create `backend/app/services/preprocessing_service.py`
@@ -18,15 +22,26 @@
- Return preprocessed image as numpy array or PIL Image - Return preprocessed image as numpy array or PIL Image
- [ ] 2.2 Implement `enhance_for_layout_detection()` function - [ ] 2.2 Implement `enhance_for_layout_detection()` function
- Input: Original image path or PIL Image - Input: Original image path or PIL Image + config
- Output: Preprocessed image (same format as input) - Output: Preprocessed image (same format as input)
- Steps: contrast → sharpen → (optional) binarize - Steps: contrast → sharpen → (optional) binarize
- [ ] 2.3 Implement `analyze_image_quality()` function (Auto mode)
- Calculate contrast level (standard deviation of grayscale)
- Detect edge clarity (Sobel/Canny edge strength)
- Return recommended `PreprocessingConfig` based on analysis
- Thresholds:
- Low contrast < 40: Apply CLAHE
- Faint edges < 0.1: Apply sharpen
- Very low contrast < 20: Consider binarize
## 3. Integration with OCR Service ## 3. Integration with OCR Service
- [ ] 3.1 Update `backend/app/services/ocr_service.py` - [ ] 3.1 Update `backend/app/services/ocr_service.py`
- Import preprocessing service - Import preprocessing service
- Before `_run_ppstructure()`, preprocess image if enabled - Check preprocessing mode (auto/manual/disabled)
- If auto: call `analyze_image_quality()` first
- Before `_run_ppstructure()`, preprocess image based on config
- Pass preprocessed image to PP-Structure for layout detection - Pass preprocessed image to PP-Structure for layout detection
- Keep original image reference for image extraction - Keep original image reference for image extraction
@@ -34,21 +49,77 @@
- Verify `saved_path` and `img_path` in elements reference original - Verify `saved_path` and `img_path` in elements reference original
- Bbox coordinates from preprocessed detection applied to original crop - Bbox coordinates from preprocessed detection applied to original crop
## 4. Testing - [ ] 3.3 Update task start API to accept preprocessing options
- Add `preprocessing_mode` parameter to start request
- Add `preprocessing_config` for manual mode overrides
- [ ] 4.1 Unit tests for preprocessing_service ## 4. Preview API
- [ ] 4.1 Create `backend/app/api/v2/endpoints/preview.py`
- `POST /api/v2/tasks/{task_id}/preview/preprocessing`
- Input: page number, preprocessing config (optional)
- Output:
- Original image (base64 or URL)
- Preprocessed image (base64 or URL)
- Auto-detected config (if mode=auto)
- Image quality metrics (contrast, edge_strength)
- [ ] 4.2 Add preview router to API
- Register in `backend/app/api/v2/api.py`
- Add appropriate authentication/authorization
## 5. Frontend UI
- [ ] 5.1 Create `frontend/src/components/PreprocessingSettings.tsx`
- Radio buttons: Auto / Manual / Disabled
- Manual mode shows:
- Contrast dropdown: None / Histogram / CLAHE
- Sharpen checkbox
- Binarize checkbox
- Preview button to trigger comparison view
- [ ] 5.2 Create `frontend/src/components/PreprocessingPreview.tsx`
- Side-by-side image comparison (original vs preprocessed)
- Display detected quality metrics
- Show which auto settings would be applied
- Slider or toggle to switch between views
- [ ] 5.3 Integrate with task start flow
- Add PreprocessingSettings to OCR track options
- Pass selected config to task start API
- Store user preference in localStorage
- [ ] 5.4 Add i18n translations
- `frontend/src/i18n/locales/zh-TW.json` - Traditional Chinese
- `frontend/src/i18n/locales/en.json` - English (if exists)
## 6. Testing
- [ ] 6.1 Unit tests for preprocessing_service
- Test contrast enhancement methods - Test contrast enhancement methods
- Test sharpening filter - Test sharpening filter
- Test binarization - Test binarization
- Test `analyze_image_quality()` with various images
- Test with various image formats (PNG, JPEG) - Test with various image formats (PNG, JPEG)
- [ ] 4.2 Integration tests - [ ] 6.2 Unit tests for preview API
- Test OCR track with preprocessing enabled/disabled - Test preview endpoint returns correct images
- Test auto-detection returns sensible config
- [ ] 6.3 Integration tests
- Test OCR track with preprocessing modes (auto/manual/disabled)
- Verify image element quality is preserved - Verify image element quality is preserved
- Test with known problematic documents (faint table borders) - Test with known problematic documents (faint table borders)
- Verify auto mode improves detection for low-quality images
## 5. Documentation ## 7. Documentation
- [ ] 5.1 Update API documentation - [ ] 7.1 Update API documentation
- Document new configuration options - Document new configuration options
- Explain preprocessing behavior - Document preview endpoint
- Explain preprocessing behavior and modes
- [ ] 7.2 Add user guide section
- When to use auto vs manual
- How to interpret quality metrics
- Troubleshooting tips