proposal: add hybrid control mode with auto-detection and preview
Updates add-layout-preprocessing proposal: - Auto mode: analyze image quality, auto-select parameters - Manual mode: user override with specific settings - Preview API: compare original vs preprocessed before processing - Frontend UI: mode selection, manual controls, preview button 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -27,13 +27,15 @@ Original Image ← ← ← ← Image extraction crops from original (NOT preproc
|
|||||||
### Goals
|
### Goals
|
||||||
- Improve table detection for documents with faint lines
|
- Improve table detection for documents with faint lines
|
||||||
- Preserve original image quality for element extraction
|
- Preserve original image quality for element extraction
|
||||||
- Make preprocessing configurable (enable/disable, intensity)
|
- **Hybrid control**: Auto mode by default, manual override available
|
||||||
|
- **Preview capability**: Users can verify preprocessing before processing
|
||||||
- Minimal performance impact
|
- Minimal performance impact
|
||||||
|
|
||||||
### Non-Goals
|
### Non-Goals
|
||||||
- Preprocessing for text recognition (Raw OCR handles this separately)
|
- Preprocessing for text recognition (Raw OCR handles this separately)
|
||||||
- Modifying how PP-Structure internally processes images
|
- Modifying how PP-Structure internally processes images
|
||||||
- General image quality improvement (out of scope)
|
- General image quality improvement (out of scope)
|
||||||
|
- Real-time preview during processing (preview is pre-processing only)
|
||||||
|
|
||||||
## Decisions
|
## Decisions
|
||||||
|
|
||||||
@@ -65,6 +67,72 @@ Original Image ← ← ← ← Image extraction crops from original (NOT preproc
|
|||||||
- Helps make faint table borders more detectable
|
- Helps make faint table borders more detectable
|
||||||
- Configurable strength
|
- Configurable strength
|
||||||
|
|
||||||
|
### Decision 5: Hybrid Control Mode (Auto + Manual)
|
||||||
|
**Rationale**:
|
||||||
|
- Auto mode provides seamless experience for most users
|
||||||
|
- Manual mode gives power users fine control
|
||||||
|
- Preview allows verification before committing to processing
|
||||||
|
|
||||||
|
**Auto-detection algorithm**:
|
||||||
|
```python
|
||||||
|
def analyze_image_quality(image: np.ndarray) -> dict:
|
||||||
|
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
|
||||||
|
|
||||||
|
# Contrast: standard deviation of pixel values
|
||||||
|
contrast = np.std(gray)
|
||||||
|
|
||||||
|
# Edge strength: mean of Sobel gradient magnitude
|
||||||
|
sobel_x = cv2.Sobel(gray, cv2.CV_64F, 1, 0, ksize=3)
|
||||||
|
sobel_y = cv2.Sobel(gray, cv2.CV_64F, 0, 1, ksize=3)
|
||||||
|
edge_strength = np.mean(np.sqrt(sobel_x**2 + sobel_y**2))
|
||||||
|
|
||||||
|
return {
|
||||||
|
"contrast": contrast,
|
||||||
|
"edge_strength": edge_strength,
|
||||||
|
"recommended": {
|
||||||
|
"contrast": "clahe" if contrast < 40 else "none",
|
||||||
|
"sharpen": edge_strength < 15,
|
||||||
|
"binarize": contrast < 20
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Decision 6: Preview API Design
|
||||||
|
**Rationale**:
|
||||||
|
- Users should see preprocessing effect before full processing
|
||||||
|
- Reduces trial-and-error cycles
|
||||||
|
- Builds user confidence in the system
|
||||||
|
|
||||||
|
**API Design**:
|
||||||
|
```
|
||||||
|
POST /api/v2/tasks/{task_id}/preview/preprocessing
|
||||||
|
Request:
|
||||||
|
{
|
||||||
|
"page": 1,
|
||||||
|
"mode": "auto", // or "manual"
|
||||||
|
"config": { // only for manual mode
|
||||||
|
"contrast": "clahe",
|
||||||
|
"sharpen": true,
|
||||||
|
"binarize": false
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
Response:
|
||||||
|
{
|
||||||
|
"original_url": "/api/v2/tasks/{id}/pages/1/image",
|
||||||
|
"preprocessed_url": "/api/v2/tasks/{id}/pages/1/image?preprocessed=true",
|
||||||
|
"quality_metrics": {
|
||||||
|
"contrast": 35.2,
|
||||||
|
"edge_strength": 12.8
|
||||||
|
},
|
||||||
|
"auto_config": {
|
||||||
|
"contrast": "clahe",
|
||||||
|
"sharpen": true,
|
||||||
|
"binarize": false
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
## Implementation Details
|
## Implementation Details
|
||||||
|
|
||||||
### Preprocessing Pipeline
|
### Preprocessing Pipeline
|
||||||
|
|||||||
@@ -18,9 +18,19 @@ The root cause is that layout detection happens **before** table structure recog
|
|||||||
- Image element extraction continues to use original (preserves quality)
|
- Image element extraction continues to use original (preserves quality)
|
||||||
- Raw OCR continues to use original image
|
- Raw OCR continues to use original image
|
||||||
|
|
||||||
- **Configurable preprocessing options**
|
- **Hybrid control mode** (Auto + Manual)
|
||||||
- Enable/disable preprocessing per track
|
- **Auto mode (default)**: Analyze image quality and auto-select parameters
|
||||||
- Adjustable preprocessing intensity
|
- Calculate contrast level (standard deviation)
|
||||||
|
- Detect edge clarity for faint lines
|
||||||
|
- Apply appropriate preprocessing based on analysis
|
||||||
|
- **Manual mode**: User can override with specific settings
|
||||||
|
- Contrast: none / histogram / clahe
|
||||||
|
- Sharpen: on/off
|
||||||
|
- Binarize: on/off
|
||||||
|
|
||||||
|
- **Frontend preview API**
|
||||||
|
- Preview endpoint to show original vs preprocessed comparison
|
||||||
|
- Users can verify settings before processing
|
||||||
|
|
||||||
## Impact
|
## Impact
|
||||||
|
|
||||||
@@ -31,6 +41,8 @@ The root cause is that layout detection happens **before** table structure recog
|
|||||||
- `backend/app/services/ocr_service.py` - Add preprocessing before PP-Structure
|
- `backend/app/services/ocr_service.py` - Add preprocessing before PP-Structure
|
||||||
- `backend/app/core/config.py` - New preprocessing configuration options
|
- `backend/app/core/config.py` - New preprocessing configuration options
|
||||||
- `backend/app/services/preprocessing_service.py` - New service (to be created)
|
- `backend/app/services/preprocessing_service.py` - New service (to be created)
|
||||||
|
- `backend/app/api/v2/endpoints/preview.py` - New preview API endpoint
|
||||||
|
- `frontend/src/components/PreprocessingSettings.tsx` - New UI component
|
||||||
|
|
||||||
### Track Impact Analysis
|
### Track Impact Analysis
|
||||||
|
|
||||||
|
|||||||
@@ -17,12 +17,6 @@ The system SHALL provide optional image preprocessing to enhance layout detectio
|
|||||||
- **THEN** the system SHALL crop from the ORIGINAL image, not the preprocessed version
|
- **THEN** the system SHALL crop from the ORIGINAL image, not the preprocessed version
|
||||||
- **AND** the extracted image SHALL maintain original quality and colors
|
- **AND** the extracted image SHALL maintain original quality and colors
|
||||||
|
|
||||||
#### Scenario: Preprocessing can be disabled
|
|
||||||
- **GIVEN** `layout_preprocessing_enabled` is set to false in configuration
|
|
||||||
- **WHEN** OCR track processing runs
|
|
||||||
- **THEN** the system SHALL skip preprocessing
|
|
||||||
- **AND** PP-Structure SHALL receive the original image directly
|
|
||||||
|
|
||||||
#### Scenario: CLAHE contrast enhancement
|
#### Scenario: CLAHE contrast enhancement
|
||||||
- **WHEN** `layout_preprocessing_contrast` is set to "clahe"
|
- **WHEN** `layout_preprocessing_contrast` is set to "clahe"
|
||||||
- **THEN** the system SHALL apply Contrast Limited Adaptive Histogram Equalization
|
- **THEN** the system SHALL apply Contrast Limited Adaptive Histogram Equalization
|
||||||
@@ -38,6 +32,61 @@ The system SHALL provide optional image preprocessing to enhance layout detectio
|
|||||||
- **THEN** the system SHALL apply adaptive thresholding
|
- **THEN** the system SHALL apply adaptive thresholding
|
||||||
- **AND** this SHALL be used only for documents with very poor contrast
|
- **AND** this SHALL be used only for documents with very poor contrast
|
||||||
|
|
||||||
|
### Requirement: Preprocessing Hybrid Control Mode
|
||||||
|
|
||||||
|
The system SHALL support three preprocessing modes: automatic, manual, and disabled, with automatic as the default.
|
||||||
|
|
||||||
|
#### Scenario: Auto mode analyzes image quality
|
||||||
|
- **GIVEN** preprocessing mode is set to "auto"
|
||||||
|
- **WHEN** processing begins for a page
|
||||||
|
- **THEN** the system SHALL analyze image quality metrics (contrast, edge strength)
|
||||||
|
- **AND** automatically determine optimal preprocessing parameters
|
||||||
|
- **AND** apply recommended settings without user intervention
|
||||||
|
|
||||||
|
#### Scenario: Auto mode detects low contrast
|
||||||
|
- **GIVEN** preprocessing mode is "auto"
|
||||||
|
- **WHEN** image contrast (standard deviation) is below 40
|
||||||
|
- **THEN** the system SHALL automatically enable CLAHE contrast enhancement
|
||||||
|
|
||||||
|
#### Scenario: Auto mode detects faint edges
|
||||||
|
- **GIVEN** preprocessing mode is "auto"
|
||||||
|
- **WHEN** image edge strength (Sobel gradient mean) is below 15
|
||||||
|
- **THEN** the system SHALL automatically enable sharpening
|
||||||
|
|
||||||
|
#### Scenario: Manual mode uses user-specified settings
|
||||||
|
- **GIVEN** preprocessing mode is set to "manual"
|
||||||
|
- **WHEN** processing begins
|
||||||
|
- **THEN** the system SHALL use the user-provided preprocessing configuration
|
||||||
|
- **AND** ignore automatic quality analysis
|
||||||
|
|
||||||
|
#### Scenario: Disabled mode skips preprocessing
|
||||||
|
- **GIVEN** preprocessing mode is set to "disabled"
|
||||||
|
- **WHEN** processing begins
|
||||||
|
- **THEN** the system SHALL skip all preprocessing
|
||||||
|
- **AND** PP-Structure SHALL receive the original image directly
|
||||||
|
|
||||||
|
### Requirement: Preprocessing Preview API
|
||||||
|
|
||||||
|
The system SHALL provide a preview endpoint that allows users to compare original and preprocessed images before processing.
|
||||||
|
|
||||||
|
#### Scenario: Preview returns comparison images
|
||||||
|
- **GIVEN** a task with uploaded document
|
||||||
|
- **WHEN** user requests preprocessing preview for a specific page
|
||||||
|
- **THEN** the system SHALL return URLs or data for both original and preprocessed images
|
||||||
|
- **AND** user can visually compare the difference
|
||||||
|
|
||||||
|
#### Scenario: Preview shows auto-detected settings
|
||||||
|
- **GIVEN** preview is requested with mode "auto"
|
||||||
|
- **WHEN** the system analyzes the page
|
||||||
|
- **THEN** the response SHALL include the auto-detected preprocessing configuration
|
||||||
|
- **AND** include quality metrics (contrast, edge_strength)
|
||||||
|
|
||||||
|
#### Scenario: Preview accepts manual configuration
|
||||||
|
- **GIVEN** preview is requested with mode "manual"
|
||||||
|
- **WHEN** user provides specific preprocessing settings
|
||||||
|
- **THEN** the system SHALL apply those settings to generate preview
|
||||||
|
- **AND** return the preprocessed result for user verification
|
||||||
|
|
||||||
### Requirement: Preprocessing Track Isolation
|
### Requirement: Preprocessing Track Isolation
|
||||||
|
|
||||||
The layout preprocessing feature SHALL only affect layout detection input without impacting other processing components.
|
The layout preprocessing feature SHALL only affect layout detection input without impacting other processing components.
|
||||||
@@ -53,3 +102,27 @@ The layout preprocessing feature SHALL only affect layout detection input withou
|
|||||||
- **WHEN** layout detection completes
|
- **WHEN** layout detection completes
|
||||||
- **THEN** the preprocessed image SHALL NOT be persisted to storage
|
- **THEN** the preprocessed image SHALL NOT be persisted to storage
|
||||||
- **AND** only the original image and element crops SHALL be saved
|
- **AND** only the original image and element crops SHALL be saved
|
||||||
|
|
||||||
|
### Requirement: Preprocessing Frontend UI
|
||||||
|
|
||||||
|
The frontend SHALL provide a user interface for configuring and previewing preprocessing settings.
|
||||||
|
|
||||||
|
#### Scenario: Mode selection is available
|
||||||
|
- **GIVEN** the user is configuring OCR track processing
|
||||||
|
- **WHEN** the preprocessing settings panel is displayed
|
||||||
|
- **THEN** the user SHALL be able to select mode: Auto (default), Manual, or Disabled
|
||||||
|
- **AND** Auto mode SHALL be pre-selected
|
||||||
|
|
||||||
|
#### Scenario: Manual mode shows configuration options
|
||||||
|
- **GIVEN** the user selects Manual mode
|
||||||
|
- **WHEN** the settings panel updates
|
||||||
|
- **THEN** the user SHALL see options for:
|
||||||
|
- Contrast enhancement (None / Histogram / CLAHE)
|
||||||
|
- Sharpen toggle
|
||||||
|
- Binarize toggle
|
||||||
|
|
||||||
|
#### Scenario: Preview button triggers comparison view
|
||||||
|
- **GIVEN** preprocessing settings are configured
|
||||||
|
- **WHEN** the user clicks Preview button
|
||||||
|
- **THEN** the system SHALL display side-by-side comparison of original and preprocessed images
|
||||||
|
- **AND** show detected quality metrics
|
||||||
|
|||||||
@@ -3,11 +3,15 @@
|
|||||||
## 1. Configuration
|
## 1. Configuration
|
||||||
|
|
||||||
- [ ] 1.1 Add preprocessing configuration to `backend/app/core/config.py`
|
- [ ] 1.1 Add preprocessing configuration to `backend/app/core/config.py`
|
||||||
- `layout_preprocessing_enabled: bool = True` - Enable/disable preprocessing
|
- `layout_preprocessing_mode: str = "auto"` - Options: auto, manual, disabled
|
||||||
- `layout_preprocessing_contrast: str = "clahe"` - Options: none, histogram, clahe
|
- `layout_preprocessing_contrast: str = "clahe"` - Options: none, histogram, clahe
|
||||||
- `layout_preprocessing_sharpen: bool = True` - Enable sharpening for faint lines
|
- `layout_preprocessing_sharpen: bool = True` - Enable sharpening for faint lines
|
||||||
- `layout_preprocessing_binarize: bool = False` - Optional binarization (aggressive)
|
- `layout_preprocessing_binarize: bool = False` - Optional binarization (aggressive)
|
||||||
|
|
||||||
|
- [ ] 1.2 Add preprocessing schema to `backend/app/schemas/task.py`
|
||||||
|
- `PreprocessingMode` enum: auto, manual, disabled
|
||||||
|
- `PreprocessingConfig` schema for API request/response
|
||||||
|
|
||||||
## 2. Preprocessing Service
|
## 2. Preprocessing Service
|
||||||
|
|
||||||
- [ ] 2.1 Create `backend/app/services/preprocessing_service.py`
|
- [ ] 2.1 Create `backend/app/services/preprocessing_service.py`
|
||||||
@@ -18,15 +22,26 @@
|
|||||||
- Return preprocessed image as numpy array or PIL Image
|
- Return preprocessed image as numpy array or PIL Image
|
||||||
|
|
||||||
- [ ] 2.2 Implement `enhance_for_layout_detection()` function
|
- [ ] 2.2 Implement `enhance_for_layout_detection()` function
|
||||||
- Input: Original image path or PIL Image
|
- Input: Original image path or PIL Image + config
|
||||||
- Output: Preprocessed image (same format as input)
|
- Output: Preprocessed image (same format as input)
|
||||||
- Steps: contrast → sharpen → (optional) binarize
|
- Steps: contrast → sharpen → (optional) binarize
|
||||||
|
|
||||||
|
- [ ] 2.3 Implement `analyze_image_quality()` function (Auto mode)
|
||||||
|
- Calculate contrast level (standard deviation of grayscale)
|
||||||
|
- Detect edge clarity (Sobel/Canny edge strength)
|
||||||
|
- Return recommended `PreprocessingConfig` based on analysis
|
||||||
|
- Thresholds:
|
||||||
|
- Low contrast < 40: Apply CLAHE
|
||||||
|
- Faint edges < 0.1: Apply sharpen
|
||||||
|
- Very low contrast < 20: Consider binarize
|
||||||
|
|
||||||
## 3. Integration with OCR Service
|
## 3. Integration with OCR Service
|
||||||
|
|
||||||
- [ ] 3.1 Update `backend/app/services/ocr_service.py`
|
- [ ] 3.1 Update `backend/app/services/ocr_service.py`
|
||||||
- Import preprocessing service
|
- Import preprocessing service
|
||||||
- Before `_run_ppstructure()`, preprocess image if enabled
|
- Check preprocessing mode (auto/manual/disabled)
|
||||||
|
- If auto: call `analyze_image_quality()` first
|
||||||
|
- Before `_run_ppstructure()`, preprocess image based on config
|
||||||
- Pass preprocessed image to PP-Structure for layout detection
|
- Pass preprocessed image to PP-Structure for layout detection
|
||||||
- Keep original image reference for image extraction
|
- Keep original image reference for image extraction
|
||||||
|
|
||||||
@@ -34,21 +49,77 @@
|
|||||||
- Verify `saved_path` and `img_path` in elements reference original
|
- Verify `saved_path` and `img_path` in elements reference original
|
||||||
- Bbox coordinates from preprocessed detection applied to original crop
|
- Bbox coordinates from preprocessed detection applied to original crop
|
||||||
|
|
||||||
## 4. Testing
|
- [ ] 3.3 Update task start API to accept preprocessing options
|
||||||
|
- Add `preprocessing_mode` parameter to start request
|
||||||
|
- Add `preprocessing_config` for manual mode overrides
|
||||||
|
|
||||||
- [ ] 4.1 Unit tests for preprocessing_service
|
## 4. Preview API
|
||||||
|
|
||||||
|
- [ ] 4.1 Create `backend/app/api/v2/endpoints/preview.py`
|
||||||
|
- `POST /api/v2/tasks/{task_id}/preview/preprocessing`
|
||||||
|
- Input: page number, preprocessing config (optional)
|
||||||
|
- Output:
|
||||||
|
- Original image (base64 or URL)
|
||||||
|
- Preprocessed image (base64 or URL)
|
||||||
|
- Auto-detected config (if mode=auto)
|
||||||
|
- Image quality metrics (contrast, edge_strength)
|
||||||
|
|
||||||
|
- [ ] 4.2 Add preview router to API
|
||||||
|
- Register in `backend/app/api/v2/api.py`
|
||||||
|
- Add appropriate authentication/authorization
|
||||||
|
|
||||||
|
## 5. Frontend UI
|
||||||
|
|
||||||
|
- [ ] 5.1 Create `frontend/src/components/PreprocessingSettings.tsx`
|
||||||
|
- Radio buttons: Auto / Manual / Disabled
|
||||||
|
- Manual mode shows:
|
||||||
|
- Contrast dropdown: None / Histogram / CLAHE
|
||||||
|
- Sharpen checkbox
|
||||||
|
- Binarize checkbox
|
||||||
|
- Preview button to trigger comparison view
|
||||||
|
|
||||||
|
- [ ] 5.2 Create `frontend/src/components/PreprocessingPreview.tsx`
|
||||||
|
- Side-by-side image comparison (original vs preprocessed)
|
||||||
|
- Display detected quality metrics
|
||||||
|
- Show which auto settings would be applied
|
||||||
|
- Slider or toggle to switch between views
|
||||||
|
|
||||||
|
- [ ] 5.3 Integrate with task start flow
|
||||||
|
- Add PreprocessingSettings to OCR track options
|
||||||
|
- Pass selected config to task start API
|
||||||
|
- Store user preference in localStorage
|
||||||
|
|
||||||
|
- [ ] 5.4 Add i18n translations
|
||||||
|
- `frontend/src/i18n/locales/zh-TW.json` - Traditional Chinese
|
||||||
|
- `frontend/src/i18n/locales/en.json` - English (if exists)
|
||||||
|
|
||||||
|
## 6. Testing
|
||||||
|
|
||||||
|
- [ ] 6.1 Unit tests for preprocessing_service
|
||||||
- Test contrast enhancement methods
|
- Test contrast enhancement methods
|
||||||
- Test sharpening filter
|
- Test sharpening filter
|
||||||
- Test binarization
|
- Test binarization
|
||||||
|
- Test `analyze_image_quality()` with various images
|
||||||
- Test with various image formats (PNG, JPEG)
|
- Test with various image formats (PNG, JPEG)
|
||||||
|
|
||||||
- [ ] 4.2 Integration tests
|
- [ ] 6.2 Unit tests for preview API
|
||||||
- Test OCR track with preprocessing enabled/disabled
|
- Test preview endpoint returns correct images
|
||||||
|
- Test auto-detection returns sensible config
|
||||||
|
|
||||||
|
- [ ] 6.3 Integration tests
|
||||||
|
- Test OCR track with preprocessing modes (auto/manual/disabled)
|
||||||
- Verify image element quality is preserved
|
- Verify image element quality is preserved
|
||||||
- Test with known problematic documents (faint table borders)
|
- Test with known problematic documents (faint table borders)
|
||||||
|
- Verify auto mode improves detection for low-quality images
|
||||||
|
|
||||||
## 5. Documentation
|
## 7. Documentation
|
||||||
|
|
||||||
- [ ] 5.1 Update API documentation
|
- [ ] 7.1 Update API documentation
|
||||||
- Document new configuration options
|
- Document new configuration options
|
||||||
- Explain preprocessing behavior
|
- Document preview endpoint
|
||||||
|
- Explain preprocessing behavior and modes
|
||||||
|
|
||||||
|
- [ ] 7.2 Add user guide section
|
||||||
|
- When to use auto vs manual
|
||||||
|
- How to interpret quality metrics
|
||||||
|
- Troubleshooting tips
|
||||||
|
|||||||
Reference in New Issue
Block a user