proposal: add hybrid control mode with auto-detection and preview
Updates add-layout-preprocessing proposal: - Auto mode: analyze image quality, auto-select parameters - Manual mode: user override with specific settings - Preview API: compare original vs preprocessed before processing - Frontend UI: mode selection, manual controls, preview button 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -3,11 +3,15 @@
|
||||
## 1. Configuration
|
||||
|
||||
- [ ] 1.1 Add preprocessing configuration to `backend/app/core/config.py`
|
||||
- `layout_preprocessing_enabled: bool = True` - Enable/disable preprocessing
|
||||
- `layout_preprocessing_mode: str = "auto"` - Options: auto, manual, disabled
|
||||
- `layout_preprocessing_contrast: str = "clahe"` - Options: none, histogram, clahe
|
||||
- `layout_preprocessing_sharpen: bool = True` - Enable sharpening for faint lines
|
||||
- `layout_preprocessing_binarize: bool = False` - Optional binarization (aggressive)
|
||||
|
||||
- [ ] 1.2 Add preprocessing schema to `backend/app/schemas/task.py`
|
||||
- `PreprocessingMode` enum: auto, manual, disabled
|
||||
- `PreprocessingConfig` schema for API request/response
|
||||
|
||||
## 2. Preprocessing Service
|
||||
|
||||
- [ ] 2.1 Create `backend/app/services/preprocessing_service.py`
|
||||
@@ -18,15 +22,26 @@
|
||||
- Return preprocessed image as numpy array or PIL Image
|
||||
|
||||
- [ ] 2.2 Implement `enhance_for_layout_detection()` function
|
||||
- Input: Original image path or PIL Image
|
||||
- Input: Original image path or PIL Image + config
|
||||
- Output: Preprocessed image (same format as input)
|
||||
- Steps: contrast → sharpen → (optional) binarize
|
||||
|
||||
- [ ] 2.3 Implement `analyze_image_quality()` function (Auto mode)
|
||||
- Calculate contrast level (standard deviation of grayscale)
|
||||
- Detect edge clarity (Sobel/Canny edge strength)
|
||||
- Return recommended `PreprocessingConfig` based on analysis
|
||||
- Thresholds:
|
||||
- Low contrast < 40: Apply CLAHE
|
||||
- Faint edges < 0.1: Apply sharpen
|
||||
- Very low contrast < 20: Consider binarize
|
||||
|
||||
## 3. Integration with OCR Service
|
||||
|
||||
- [ ] 3.1 Update `backend/app/services/ocr_service.py`
|
||||
- Import preprocessing service
|
||||
- Before `_run_ppstructure()`, preprocess image if enabled
|
||||
- Check preprocessing mode (auto/manual/disabled)
|
||||
- If auto: call `analyze_image_quality()` first
|
||||
- Before `_run_ppstructure()`, preprocess image based on config
|
||||
- Pass preprocessed image to PP-Structure for layout detection
|
||||
- Keep original image reference for image extraction
|
||||
|
||||
@@ -34,21 +49,77 @@
|
||||
- Verify `saved_path` and `img_path` in elements reference original
|
||||
- Bbox coordinates from preprocessed detection applied to original crop
|
||||
|
||||
## 4. Testing
|
||||
- [ ] 3.3 Update task start API to accept preprocessing options
|
||||
- Add `preprocessing_mode` parameter to start request
|
||||
- Add `preprocessing_config` for manual mode overrides
|
||||
|
||||
- [ ] 4.1 Unit tests for preprocessing_service
|
||||
## 4. Preview API
|
||||
|
||||
- [ ] 4.1 Create `backend/app/api/v2/endpoints/preview.py`
|
||||
- `POST /api/v2/tasks/{task_id}/preview/preprocessing`
|
||||
- Input: page number, preprocessing config (optional)
|
||||
- Output:
|
||||
- Original image (base64 or URL)
|
||||
- Preprocessed image (base64 or URL)
|
||||
- Auto-detected config (if mode=auto)
|
||||
- Image quality metrics (contrast, edge_strength)
|
||||
|
||||
- [ ] 4.2 Add preview router to API
|
||||
- Register in `backend/app/api/v2/api.py`
|
||||
- Add appropriate authentication/authorization
|
||||
|
||||
## 5. Frontend UI
|
||||
|
||||
- [ ] 5.1 Create `frontend/src/components/PreprocessingSettings.tsx`
|
||||
- Radio buttons: Auto / Manual / Disabled
|
||||
- Manual mode shows:
|
||||
- Contrast dropdown: None / Histogram / CLAHE
|
||||
- Sharpen checkbox
|
||||
- Binarize checkbox
|
||||
- Preview button to trigger comparison view
|
||||
|
||||
- [ ] 5.2 Create `frontend/src/components/PreprocessingPreview.tsx`
|
||||
- Side-by-side image comparison (original vs preprocessed)
|
||||
- Display detected quality metrics
|
||||
- Show which auto settings would be applied
|
||||
- Slider or toggle to switch between views
|
||||
|
||||
- [ ] 5.3 Integrate with task start flow
|
||||
- Add PreprocessingSettings to OCR track options
|
||||
- Pass selected config to task start API
|
||||
- Store user preference in localStorage
|
||||
|
||||
- [ ] 5.4 Add i18n translations
|
||||
- `frontend/src/i18n/locales/zh-TW.json` - Traditional Chinese
|
||||
- `frontend/src/i18n/locales/en.json` - English (if exists)
|
||||
|
||||
## 6. Testing
|
||||
|
||||
- [ ] 6.1 Unit tests for preprocessing_service
|
||||
- Test contrast enhancement methods
|
||||
- Test sharpening filter
|
||||
- Test binarization
|
||||
- Test `analyze_image_quality()` with various images
|
||||
- Test with various image formats (PNG, JPEG)
|
||||
|
||||
- [ ] 4.2 Integration tests
|
||||
- Test OCR track with preprocessing enabled/disabled
|
||||
- [ ] 6.2 Unit tests for preview API
|
||||
- Test preview endpoint returns correct images
|
||||
- Test auto-detection returns sensible config
|
||||
|
||||
- [ ] 6.3 Integration tests
|
||||
- Test OCR track with preprocessing modes (auto/manual/disabled)
|
||||
- Verify image element quality is preserved
|
||||
- Test with known problematic documents (faint table borders)
|
||||
- Verify auto mode improves detection for low-quality images
|
||||
|
||||
## 5. Documentation
|
||||
## 7. Documentation
|
||||
|
||||
- [ ] 5.1 Update API documentation
|
||||
- [ ] 7.1 Update API documentation
|
||||
- Document new configuration options
|
||||
- Explain preprocessing behavior
|
||||
- Document preview endpoint
|
||||
- Explain preprocessing behavior and modes
|
||||
|
||||
- [ ] 7.2 Add user guide section
|
||||
- When to use auto vs manual
|
||||
- How to interpret quality metrics
|
||||
- Troubleshooting tips
|
||||
|
||||
Reference in New Issue
Block a user