Files
OCR/openspec/changes/add-layout-preprocessing/tasks.md
egg c12ea0b9f6 proposal: add-layout-preprocessing for improved table detection
Problem: PP-Structure misses tables with faint lines/borders
Solution: Preprocess images (contrast, sharpen) for layout detection
- Preprocessed image only used for layout detection
- Original image preserved for element extraction (quality)

Includes: proposal.md, design.md, tasks.md, spec delta

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-27 14:24:23 +08:00

2.0 KiB

Tasks: Add Image Preprocessing for Layout Detection

1. Configuration

  • 1.1 Add preprocessing configuration to backend/app/core/config.py
    • layout_preprocessing_enabled: bool = True - Enable/disable preprocessing
    • layout_preprocessing_contrast: str = "clahe" - Options: none, histogram, clahe
    • layout_preprocessing_sharpen: bool = True - Enable sharpening for faint lines
    • layout_preprocessing_binarize: bool = False - Optional binarization (aggressive)

2. Preprocessing Service

  • 2.1 Create backend/app/services/preprocessing_service.py

    • Image loading utility (supports PIL, OpenCV)
    • Contrast enhancement methods (histogram equalization, CLAHE)
    • Sharpening filter for line enhancement
    • Optional adaptive binarization
    • Return preprocessed image as numpy array or PIL Image
  • 2.2 Implement enhance_for_layout_detection() function

    • Input: Original image path or PIL Image
    • Output: Preprocessed image (same format as input)
    • Steps: contrast → sharpen → (optional) binarize

3. Integration with OCR Service

  • 3.1 Update backend/app/services/ocr_service.py

    • Import preprocessing service
    • Before _run_ppstructure(), preprocess image if enabled
    • Pass preprocessed image to PP-Structure for layout detection
    • Keep original image reference for image extraction
  • 3.2 Ensure image element extraction uses original

    • Verify saved_path and img_path in elements reference original
    • Bbox coordinates from preprocessed detection applied to original crop

4. Testing

  • 4.1 Unit tests for preprocessing_service

    • Test contrast enhancement methods
    • Test sharpening filter
    • Test binarization
    • Test with various image formats (PNG, JPEG)
  • 4.2 Integration tests

    • Test OCR track with preprocessing enabled/disabled
    • Verify image element quality is preserved
    • Test with known problematic documents (faint table borders)

5. Documentation

  • 5.1 Update API documentation
    • Document new configuration options
    • Explain preprocessing behavior