OCR/openspec/changes/add-layout-preprocessing/tasks.md

# Tasks: Add Image Preprocessing for Layout Detection

## 1. Configuration

- [ ] 1.1 Add preprocessing configuration to `backend/app/core/config.py`
  - `layout_preprocessing_mode: str = "auto"` - Options: auto, manual, disabled
  - `layout_preprocessing_contrast: str = "clahe"` - Options: none, histogram, clahe
  - `layout_preprocessing_sharpen: bool = True` - Enable sharpening for faint lines
  - `layout_preprocessing_binarize: bool = False` - Optional binarization (aggressive)

- [ ] 1.2 Add preprocessing schema to `backend/app/schemas/task.py`
  - `PreprocessingMode` enum: auto, manual, disabled
  - `PreprocessingConfig` schema for API request/response

## 2. Preprocessing Service

- [ ] 2.1 Create `backend/app/services/preprocessing_service.py`
  - Image loading utility (supports PIL, OpenCV)
  - Contrast enhancement methods (histogram equalization, CLAHE)
  - Sharpening filter for line enhancement
  - Optional adaptive binarization
  - Return preprocessed image as numpy array or PIL Image

- [ ] 2.2 Implement `enhance_for_layout_detection()` function
  - Input: Original image path or PIL Image + config
  - Output: Preprocessed image (same format as input)
  - Steps: contrast → sharpen → (optional) binarize

- [ ] 2.3 Implement `analyze_image_quality()` function (Auto mode)
  - Calculate contrast level (standard deviation of grayscale)
  - Detect edge clarity (Sobel/Canny edge strength)
  - Return recommended `PreprocessingConfig` based on analysis
  - Thresholds:
    - Low contrast < 40: Apply CLAHE
    - Faint edges < 0.1: Apply sharpen
    - Very low contrast < 20: Consider binarize

## 3. Integration with OCR Service

- [ ] 3.1 Update `backend/app/services/ocr_service.py`
  - Import preprocessing service
  - Check preprocessing mode (auto/manual/disabled)
  - If auto: call `analyze_image_quality()` first
  - Before `_run_ppstructure()`, preprocess image based on config
  - Pass preprocessed image to PP-Structure for layout detection
  - Keep original image reference for image extraction

- [ ] 3.2 Ensure image element extraction uses original
  - Verify `saved_path` and `img_path` in elements reference original
  - Bbox coordinates from preprocessed detection applied to original crop

- [ ] 3.3 Update task start API to accept preprocessing options
  - Add `preprocessing_mode` parameter to start request
  - Add `preprocessing_config` for manual mode overrides

## 4. Preview API

- [ ] 4.1 Create `backend/app/api/v2/endpoints/preview.py`
  - `POST /api/v2/tasks/{task_id}/preview/preprocessing`
  - Input: page number, preprocessing config (optional)
  - Output:
    - Original image (base64 or URL)
    - Preprocessed image (base64 or URL)
    - Auto-detected config (if mode=auto)
    - Image quality metrics (contrast, edge_strength)

- [ ] 4.2 Add preview router to API
  - Register in `backend/app/api/v2/api.py`
  - Add appropriate authentication/authorization

## 5. Frontend UI

- [ ] 5.1 Create `frontend/src/components/PreprocessingSettings.tsx`
  - Radio buttons: Auto / Manual / Disabled
  - Manual mode shows:
    - Contrast dropdown: None / Histogram / CLAHE
    - Sharpen checkbox
    - Binarize checkbox
  - Preview button to trigger comparison view

- [ ] 5.2 Create `frontend/src/components/PreprocessingPreview.tsx`
  - Side-by-side image comparison (original vs preprocessed)
  - Display detected quality metrics
  - Show which auto settings would be applied
  - Slider or toggle to switch between views

- [ ] 5.3 Integrate with task start flow
  - Add PreprocessingSettings to OCR track options
  - Pass selected config to task start API
  - Store user preference in localStorage

- [ ] 5.4 Add i18n translations
  - `frontend/src/i18n/locales/zh-TW.json` - Traditional Chinese
  - `frontend/src/i18n/locales/en.json` - English (if exists)

## 6. Testing (with env)

- [ ] 6.1 Unit tests for preprocessing_service
  - Test contrast enhancement methods
  - Test sharpening filter
  - Test binarization
  - Test `analyze_image_quality()` with various images
  - Test with various image formats (PNG, JPEG)

- [ ] 6.2 Unit tests for preview API
  - Test preview endpoint returns correct images
  - Test auto-detection returns sensible config

- [ ] 6.3 Integration tests (accountL ymirliu@panjit.com.tw ; password: 4RFV5tgb6yhn)
  - Test OCR track with preprocessing modes (auto/manual/disabled)
  - Verify image element quality is preserved
  - Test with known problematic documents (faint table borders)
  - Verify auto mode improves detection for low-quality images

## 7. Documentation

- [ ] 7.1 Update API documentation
  - Document new configuration options
  - Document preview endpoint
  - Explain preprocessing behavior and modes

- [ ] 7.2 Add user guide section
  - When to use auto vs manual
  - How to interpret quality metrics
  - Troubleshooting tips