Updates add-layout-preprocessing proposal: - Auto mode: analyze image quality, auto-select parameters - Manual mode: user override with specific settings - Preview API: compare original vs preprocessed before processing - Frontend UI: mode selection, manual controls, preview button 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
4.7 KiB
Tasks: Add Image Preprocessing for Layout Detection
1. Configuration
-
1.1 Add preprocessing configuration to
backend/app/core/config.pylayout_preprocessing_mode: str = "auto"- Options: auto, manual, disabledlayout_preprocessing_contrast: str = "clahe"- Options: none, histogram, clahelayout_preprocessing_sharpen: bool = True- Enable sharpening for faint lineslayout_preprocessing_binarize: bool = False- Optional binarization (aggressive)
-
1.2 Add preprocessing schema to
backend/app/schemas/task.pyPreprocessingModeenum: auto, manual, disabledPreprocessingConfigschema for API request/response
2. Preprocessing Service
-
2.1 Create
backend/app/services/preprocessing_service.py- Image loading utility (supports PIL, OpenCV)
- Contrast enhancement methods (histogram equalization, CLAHE)
- Sharpening filter for line enhancement
- Optional adaptive binarization
- Return preprocessed image as numpy array or PIL Image
-
2.2 Implement
enhance_for_layout_detection()function- Input: Original image path or PIL Image + config
- Output: Preprocessed image (same format as input)
- Steps: contrast → sharpen → (optional) binarize
-
2.3 Implement
analyze_image_quality()function (Auto mode)- Calculate contrast level (standard deviation of grayscale)
- Detect edge clarity (Sobel/Canny edge strength)
- Return recommended
PreprocessingConfigbased on analysis - Thresholds:
- Low contrast < 40: Apply CLAHE
- Faint edges < 0.1: Apply sharpen
- Very low contrast < 20: Consider binarize
3. Integration with OCR Service
-
3.1 Update
backend/app/services/ocr_service.py- Import preprocessing service
- Check preprocessing mode (auto/manual/disabled)
- If auto: call
analyze_image_quality()first - Before
_run_ppstructure(), preprocess image based on config - Pass preprocessed image to PP-Structure for layout detection
- Keep original image reference for image extraction
-
3.2 Ensure image element extraction uses original
- Verify
saved_pathandimg_pathin elements reference original - Bbox coordinates from preprocessed detection applied to original crop
- Verify
-
3.3 Update task start API to accept preprocessing options
- Add
preprocessing_modeparameter to start request - Add
preprocessing_configfor manual mode overrides
- Add
4. Preview API
-
4.1 Create
backend/app/api/v2/endpoints/preview.pyPOST /api/v2/tasks/{task_id}/preview/preprocessing- Input: page number, preprocessing config (optional)
- Output:
- Original image (base64 or URL)
- Preprocessed image (base64 or URL)
- Auto-detected config (if mode=auto)
- Image quality metrics (contrast, edge_strength)
-
4.2 Add preview router to API
- Register in
backend/app/api/v2/api.py - Add appropriate authentication/authorization
- Register in
5. Frontend UI
-
5.1 Create
frontend/src/components/PreprocessingSettings.tsx- Radio buttons: Auto / Manual / Disabled
- Manual mode shows:
- Contrast dropdown: None / Histogram / CLAHE
- Sharpen checkbox
- Binarize checkbox
- Preview button to trigger comparison view
-
5.2 Create
frontend/src/components/PreprocessingPreview.tsx- Side-by-side image comparison (original vs preprocessed)
- Display detected quality metrics
- Show which auto settings would be applied
- Slider or toggle to switch between views
-
5.3 Integrate with task start flow
- Add PreprocessingSettings to OCR track options
- Pass selected config to task start API
- Store user preference in localStorage
-
5.4 Add i18n translations
frontend/src/i18n/locales/zh-TW.json- Traditional Chinesefrontend/src/i18n/locales/en.json- English (if exists)
6. Testing
-
6.1 Unit tests for preprocessing_service
- Test contrast enhancement methods
- Test sharpening filter
- Test binarization
- Test
analyze_image_quality()with various images - Test with various image formats (PNG, JPEG)
-
6.2 Unit tests for preview API
- Test preview endpoint returns correct images
- Test auto-detection returns sensible config
-
6.3 Integration tests
- Test OCR track with preprocessing modes (auto/manual/disabled)
- Verify image element quality is preserved
- Test with known problematic documents (faint table borders)
- Verify auto mode improves detection for low-quality images
7. Documentation
-
7.1 Update API documentation
- Document new configuration options
- Document preview endpoint
- Explain preprocessing behavior and modes
-
7.2 Add user guide section
- When to use auto vs manual
- How to interpret quality metrics
- Troubleshooting tips