Files
OCR/openspec/changes/add-layout-preprocessing/proposal.md
egg c12ea0b9f6 proposal: add-layout-preprocessing for improved table detection
Problem: PP-Structure misses tables with faint lines/borders
Solution: Preprocess images (contrast, sharpen) for layout detection
- Preprocessed image only used for layout detection
- Original image preserved for element extraction (quality)

Includes: proposal.md, design.md, tasks.md, spec delta

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-27 14:24:23 +08:00

2.6 KiB

Change: Add Image Preprocessing for Layout Detection

Why

PP-StructureV3's layout detection (PP-DocLayout_plus-L) sometimes fails to detect tables with faint lines, low contrast borders, or poor scan quality. This results in missing table elements in the output, even when the table structure recognition models (SLANeXt) are correctly configured.

The root cause is that layout detection happens before table structure recognition - if a region isn't identified as a "table" in the layout detection stage, the table recognition models never get invoked.

What Changes

  • Add image preprocessing module for layout detection input

    • Contrast enhancement (histogram equalization, CLAHE)
    • Optional binarization (adaptive thresholding)
    • Sharpening for faint lines
  • Preserve original images for extraction

    • Preprocessing ONLY affects layout detection input
    • Image element extraction continues to use original (preserves quality)
    • Raw OCR continues to use original image
  • Configurable preprocessing options

    • Enable/disable preprocessing per track
    • Adjustable preprocessing intensity

Impact

Affected Specs

  • ocr-processing - New preprocessing configuration requirements

Affected Code

  • backend/app/services/ocr_service.py - Add preprocessing before PP-Structure
  • backend/app/core/config.py - New preprocessing configuration options
  • backend/app/services/preprocessing_service.py - New service (to be created)

Track Impact Analysis

Track Impact Reason
OCR Improved layout detection Preprocessing enhances PP-Structure input
Hybrid Potentially improved Uses PP-Structure for layout
Direct No impact Does not use PP-Structure
Raw OCR No impact Continues using original image

Quality Impact

Component Impact Reason
Table detection Improved Enhanced contrast reveals faint borders
Image extraction No change Uses original image for quality
Text recognition No change Raw OCR uses original image
Reading order Improved Better element detection → better ordering

Risks

  1. Performance overhead: Preprocessing adds compute time per page

    • Mitigation: Make preprocessing optional, cache preprocessed images
  2. Over-processing: Strong enhancement may introduce artifacts

    • Mitigation: Configurable intensity levels, default to moderate enhancement
  3. Memory usage: Keeping both original and preprocessed images

    • Mitigation: Preprocessed image is temporary, discarded after layout detection