# Tasks: Add Image Preprocessing for Layout Detection ## 1. Configuration - [x] 1.1 Add preprocessing configuration to `backend/app/core/config.py` - `layout_preprocessing_mode: str = "auto"` - Options: auto, manual, disabled - `layout_preprocessing_contrast: str = "clahe"` - Options: none, histogram, clahe - `layout_preprocessing_sharpen: bool = True` - Enable sharpening for faint lines - `layout_preprocessing_binarize: bool = False` - Optional binarization (aggressive) - [x] 1.2 Add preprocessing schema to `backend/app/schemas/task.py` - `PreprocessingMode` enum: auto, manual, disabled - `PreprocessingConfig` schema for API request/response ## 2. Preprocessing Service - [x] 2.1 Create `backend/app/services/layout_preprocessing_service.py` - Image loading utility (supports PIL, OpenCV) - Contrast enhancement methods (histogram equalization, CLAHE) - Sharpening filter for line enhancement - Optional adaptive binarization - Return preprocessed image as numpy array or PIL Image - [x] 2.2 Implement `preprocess()` and `preprocess_to_pil()` functions - Input: Original image path or PIL Image + config - Output: Preprocessed image (same format as input) + PreprocessingResult - Steps: contrast → sharpen → (optional) binarize - [x] 2.3 Implement `analyze_image_quality()` function (Auto mode) - Calculate contrast level (standard deviation of grayscale) - Detect edge clarity (Sobel gradient mean) - Return ImageQualityMetrics based on analysis - `get_auto_config()` returns PreprocessingConfig based on thresholds: - Low contrast < 40: Apply CLAHE - Faint edges < 15: Apply sharpen - Very low contrast < 20: Consider binarize ## 3. Integration with OCR Service - [x] 3.1 Update `backend/app/services/ocr_service.py` - Import preprocessing service - Check preprocessing mode (auto/manual/disabled) - If auto: call `analyze_image_quality()` first - Before PP-Structure prediction, preprocess image based on config - Pass preprocessed PIL Image to PP-Structure for layout detection - Keep original image reference for image extraction - [x] 3.2 Update `backend/app/services/pp_structure_enhanced.py` - Add `preprocessed_image` parameter to `analyze_with_full_structure()` - When preprocessed_image provided, convert to BGR numpy array and pass to PP-Structure - Bbox coordinates from preprocessed detection applied to original image crop - [x] 3.3 Update task start API to accept preprocessing options - Add `preprocessing_mode` parameter to ProcessingOptions - Add `preprocessing_config` for manual mode overrides ## 4. Preview API - [x] 4.1 Create preview endpoints in `backend/app/routers/tasks.py` - `POST /api/v2/tasks/{task_id}/preview/preprocessing` - Input: page number, preprocessing mode/config - Output: PreprocessingPreviewResponse with: - Original image URL - Preprocessed image URL - Auto-detected config - Image quality metrics (contrast, edge_strength) - `GET /api/v2/tasks/{task_id}/preview/image` - Serve preview images - [x] 4.2 Add preview router functionality - Integrated into tasks router - Uses task authentication/authorization ## 5. Frontend UI - [x] 5.1 Create `frontend/src/components/PreprocessingSettings.tsx` - Radio buttons with icons: Auto / Manual / Disabled - Manual mode shows: - Contrast dropdown: None / Histogram / CLAHE - Sharpen checkbox - Binarize checkbox (with warning) - Preview button integration (onPreview prop) - [ ] 5.2 Create `frontend/src/components/PreprocessingPreview.tsx` (optional) - Side-by-side image comparison (original vs preprocessed) - Display detected quality metrics - Note: Preview functionality available via API, UI modal is optional enhancement - [x] 5.3 Integrate with task start flow - Added PreprocessingSettings to ProcessingPage.tsx - Pass selected config to task start API - Note: localStorage preference storage is optional enhancement - [x] 5.4 Add i18n translations - `frontend/src/i18n/locales/zh-TW.json` - Traditional Chinese ## 6. Testing - [x] 6.1 Unit tests for preprocessing_service - Validated imports and service creation - Tested `analyze_image_quality()` with test images - Tested `get_auto_config()` returns sensible config - Tested `preprocess()` produces correct output shape - [ ] 6.2 Integration tests for preview API (optional) - Manual testing recommended with actual documents - [ ] 6.3 End-to-end testing - Test OCR track with preprocessing modes (auto/manual/disabled) - Test with known problematic documents (faint table borders) ## 7. Documentation - [x] 7.1 Update API documentation - Schemas documented in task.py with Field descriptions - Preview endpoint accessible via /docs - [ ] 7.2 Add user guide section (optional) - When to use auto vs manual - How to interpret quality metrics --- ## Implementation Summary **Backend commits:** 1. `feat: implement layout preprocessing backend` - Core service, OCR integration, preview API **Frontend commits:** 1. `feat: add preprocessing UI components and integration` - PreprocessingSettings, i18n, ProcessingPage integration **Key files created/modified:** - `backend/app/services/layout_preprocessing_service.py` (new) - `backend/app/core/config.py` (updated) - `backend/app/schemas/task.py` (updated) - `backend/app/services/ocr_service.py` (updated) - `backend/app/services/pp_structure_enhanced.py` (updated) - `backend/app/routers/tasks.py` (updated) - `frontend/src/components/PreprocessingSettings.tsx` (new) - `frontend/src/types/apiV2.ts` (updated) - `frontend/src/pages/ProcessingPage.tsx` (updated) - `frontend/src/i18n/locales/zh-TW.json` (updated)