Files
OCR/openspec/changes/add-layout-preprocessing/tasks.md
egg ea0dd7456c feat: implement layout preprocessing backend
Backend implementation for add-layout-preprocessing proposal:
- Add LayoutPreprocessingService with CLAHE, sharpen, binarize
- Add auto-detection: analyze_image_quality() for contrast/edge metrics
- Integrate preprocessing into OCR pipeline (analyze_layout)
- Add Preview API: POST /api/v2/tasks/{id}/preview/preprocessing
- Add config options: layout_preprocessing_mode, thresholds
- Add schemas: PreprocessingConfig, PreprocessingPreviewResponse

Preprocessing only affects layout detection input.
Original images preserved for element extraction.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-27 15:17:20 +08:00

4.7 KiB

Tasks: Add Image Preprocessing for Layout Detection

1. Configuration

  • 1.1 Add preprocessing configuration to backend/app/core/config.py

    • layout_preprocessing_mode: str = "auto" - Options: auto, manual, disabled
    • layout_preprocessing_contrast: str = "clahe" - Options: none, histogram, clahe
    • layout_preprocessing_sharpen: bool = True - Enable sharpening for faint lines
    • layout_preprocessing_binarize: bool = False - Optional binarization (aggressive)
  • 1.2 Add preprocessing schema to backend/app/schemas/task.py

    • PreprocessingMode enum: auto, manual, disabled
    • PreprocessingConfig schema for API request/response

2. Preprocessing Service

  • 2.1 Create backend/app/services/preprocessing_service.py

    • Image loading utility (supports PIL, OpenCV)
    • Contrast enhancement methods (histogram equalization, CLAHE)
    • Sharpening filter for line enhancement
    • Optional adaptive binarization
    • Return preprocessed image as numpy array or PIL Image
  • 2.2 Implement enhance_for_layout_detection() function

    • Input: Original image path or PIL Image + config
    • Output: Preprocessed image (same format as input)
    • Steps: contrast → sharpen → (optional) binarize
  • 2.3 Implement analyze_image_quality() function (Auto mode)

    • Calculate contrast level (standard deviation of grayscale)
    • Detect edge clarity (Sobel/Canny edge strength)
    • Return recommended PreprocessingConfig based on analysis
    • Thresholds:
      • Low contrast < 40: Apply CLAHE
      • Faint edges < 0.1: Apply sharpen
      • Very low contrast < 20: Consider binarize

3. Integration with OCR Service

  • 3.1 Update backend/app/services/ocr_service.py

    • Import preprocessing service
    • Check preprocessing mode (auto/manual/disabled)
    • If auto: call analyze_image_quality() first
    • Before _run_ppstructure(), preprocess image based on config
    • Pass preprocessed image to PP-Structure for layout detection
    • Keep original image reference for image extraction
  • 3.2 Ensure image element extraction uses original

    • Verify saved_path and img_path in elements reference original
    • Bbox coordinates from preprocessed detection applied to original crop
  • 3.3 Update task start API to accept preprocessing options

    • Add preprocessing_mode parameter to start request
    • Add preprocessing_config for manual mode overrides

4. Preview API

  • 4.1 Create backend/app/api/v2/endpoints/preview.py

    • POST /api/v2/tasks/{task_id}/preview/preprocessing
    • Input: page number, preprocessing config (optional)
    • Output:
      • Original image (base64 or URL)
      • Preprocessed image (base64 or URL)
      • Auto-detected config (if mode=auto)
      • Image quality metrics (contrast, edge_strength)
  • 4.2 Add preview router to API

    • Register in backend/app/api/v2/api.py
    • Add appropriate authentication/authorization

5. Frontend UI

  • 5.1 Create frontend/src/components/PreprocessingSettings.tsx

    • Radio buttons: Auto / Manual / Disabled
    • Manual mode shows:
      • Contrast dropdown: None / Histogram / CLAHE
      • Sharpen checkbox
      • Binarize checkbox
    • Preview button to trigger comparison view
  • 5.2 Create frontend/src/components/PreprocessingPreview.tsx

    • Side-by-side image comparison (original vs preprocessed)
    • Display detected quality metrics
    • Show which auto settings would be applied
    • Slider or toggle to switch between views
  • 5.3 Integrate with task start flow

    • Add PreprocessingSettings to OCR track options
    • Pass selected config to task start API
    • Store user preference in localStorage
  • 5.4 Add i18n translations

    • frontend/src/i18n/locales/zh-TW.json - Traditional Chinese
    • frontend/src/i18n/locales/en.json - English (if exists)

6. Testing (with env)

  • 6.1 Unit tests for preprocessing_service

    • Test contrast enhancement methods
    • Test sharpening filter
    • Test binarization
    • Test analyze_image_quality() with various images
    • Test with various image formats (PNG, JPEG)
  • 6.2 Unit tests for preview API

    • Test preview endpoint returns correct images
    • Test auto-detection returns sensible config
  • 6.3 Integration tests (accountL ymirliu@panjit.com.tw ; password: 4RFV5tgb6yhn)

    • Test OCR track with preprocessing modes (auto/manual/disabled)
    • Verify image element quality is preserved
    • Test with known problematic documents (faint table borders)
    • Verify auto mode improves detection for low-quality images

7. Documentation

  • 7.1 Update API documentation

    • Document new configuration options
    • Document preview endpoint
    • Explain preprocessing behavior and modes
  • 7.2 Add user guide section

    • When to use auto vs manual
    • How to interpret quality metrics
    • Troubleshooting tips