Mark implemented tasks as complete and add implementation summary: - Configuration: config.py and schema additions ✓ - Preprocessing service: layout_preprocessing_service.py ✓ - OCR integration: ocr_service.py and pp_structure_enhanced.py ✓ - Preview API endpoints ✓ - Frontend UI: PreprocessingSettings component ✓ - i18n translations (zh-TW) ✓ - Basic unit testing ✓ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
5.6 KiB
Tasks: Add Image Preprocessing for Layout Detection
1. Configuration
-
1.1 Add preprocessing configuration to
backend/app/core/config.pylayout_preprocessing_mode: str = "auto"- Options: auto, manual, disabledlayout_preprocessing_contrast: str = "clahe"- Options: none, histogram, clahelayout_preprocessing_sharpen: bool = True- Enable sharpening for faint lineslayout_preprocessing_binarize: bool = False- Optional binarization (aggressive)
-
1.2 Add preprocessing schema to
backend/app/schemas/task.pyPreprocessingModeenum: auto, manual, disabledPreprocessingConfigschema for API request/response
2. Preprocessing Service
-
2.1 Create
backend/app/services/layout_preprocessing_service.py- Image loading utility (supports PIL, OpenCV)
- Contrast enhancement methods (histogram equalization, CLAHE)
- Sharpening filter for line enhancement
- Optional adaptive binarization
- Return preprocessed image as numpy array or PIL Image
-
2.2 Implement
preprocess()andpreprocess_to_pil()functions- Input: Original image path or PIL Image + config
- Output: Preprocessed image (same format as input) + PreprocessingResult
- Steps: contrast → sharpen → (optional) binarize
-
2.3 Implement
analyze_image_quality()function (Auto mode)- Calculate contrast level (standard deviation of grayscale)
- Detect edge clarity (Sobel gradient mean)
- Return ImageQualityMetrics based on analysis
get_auto_config()returns PreprocessingConfig based on thresholds:- Low contrast < 40: Apply CLAHE
- Faint edges < 15: Apply sharpen
- Very low contrast < 20: Consider binarize
3. Integration with OCR Service
-
3.1 Update
backend/app/services/ocr_service.py- Import preprocessing service
- Check preprocessing mode (auto/manual/disabled)
- If auto: call
analyze_image_quality()first - Before PP-Structure prediction, preprocess image based on config
- Pass preprocessed PIL Image to PP-Structure for layout detection
- Keep original image reference for image extraction
-
3.2 Update
backend/app/services/pp_structure_enhanced.py- Add
preprocessed_imageparameter toanalyze_with_full_structure() - When preprocessed_image provided, convert to BGR numpy array and pass to PP-Structure
- Bbox coordinates from preprocessed detection applied to original image crop
- Add
-
3.3 Update task start API to accept preprocessing options
- Add
preprocessing_modeparameter to ProcessingOptions - Add
preprocessing_configfor manual mode overrides
- Add
4. Preview API
-
4.1 Create preview endpoints in
backend/app/routers/tasks.pyPOST /api/v2/tasks/{task_id}/preview/preprocessing- Input: page number, preprocessing mode/config
- Output: PreprocessingPreviewResponse with:
- Original image URL
- Preprocessed image URL
- Auto-detected config
- Image quality metrics (contrast, edge_strength)
GET /api/v2/tasks/{task_id}/preview/image- Serve preview images
-
4.2 Add preview router functionality
- Integrated into tasks router
- Uses task authentication/authorization
5. Frontend UI
-
5.1 Create
frontend/src/components/PreprocessingSettings.tsx- Radio buttons with icons: Auto / Manual / Disabled
- Manual mode shows:
- Contrast dropdown: None / Histogram / CLAHE
- Sharpen checkbox
- Binarize checkbox (with warning)
- Preview button integration (onPreview prop)
-
5.2 Create
frontend/src/components/PreprocessingPreview.tsx(optional)- Side-by-side image comparison (original vs preprocessed)
- Display detected quality metrics
- Note: Preview functionality available via API, UI modal is optional enhancement
-
5.3 Integrate with task start flow
- Added PreprocessingSettings to ProcessingPage.tsx
- Pass selected config to task start API
- Note: localStorage preference storage is optional enhancement
-
5.4 Add i18n translations
frontend/src/i18n/locales/zh-TW.json- Traditional Chinese
6. Testing
-
6.1 Unit tests for preprocessing_service
- Validated imports and service creation
- Tested
analyze_image_quality()with test images - Tested
get_auto_config()returns sensible config - Tested
preprocess()produces correct output shape
-
6.2 Integration tests for preview API (optional)
- Manual testing recommended with actual documents
-
6.3 End-to-end testing
- Test OCR track with preprocessing modes (auto/manual/disabled)
- Test with known problematic documents (faint table borders)
7. Documentation
-
7.1 Update API documentation
- Schemas documented in task.py with Field descriptions
- Preview endpoint accessible via /docs
-
7.2 Add user guide section (optional)
- When to use auto vs manual
- How to interpret quality metrics
Implementation Summary
Backend commits:
feat: implement layout preprocessing backend- Core service, OCR integration, preview API
Frontend commits:
feat: add preprocessing UI components and integration- PreprocessingSettings, i18n, ProcessingPage integration
Key files created/modified:
backend/app/services/layout_preprocessing_service.py(new)backend/app/core/config.py(updated)backend/app/schemas/task.py(updated)backend/app/services/ocr_service.py(updated)backend/app/services/pp_structure_enhanced.py(updated)backend/app/routers/tasks.py(updated)frontend/src/components/PreprocessingSettings.tsx(new)frontend/src/types/apiV2.ts(updated)frontend/src/pages/ProcessingPage.tsx(updated)frontend/src/i18n/locales/zh-TW.json(updated)