Problem: PP-Structure misses tables with faint lines/borders Solution: Preprocess images (contrast, sharpen) for layout detection - Preprocessed image only used for layout detection - Original image preserved for element extraction (quality) Includes: proposal.md, design.md, tasks.md, spec delta 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2.0 KiB
2.0 KiB
Tasks: Add Image Preprocessing for Layout Detection
1. Configuration
- 1.1 Add preprocessing configuration to
backend/app/core/config.pylayout_preprocessing_enabled: bool = True- Enable/disable preprocessinglayout_preprocessing_contrast: str = "clahe"- Options: none, histogram, clahelayout_preprocessing_sharpen: bool = True- Enable sharpening for faint lineslayout_preprocessing_binarize: bool = False- Optional binarization (aggressive)
2. Preprocessing Service
-
2.1 Create
backend/app/services/preprocessing_service.py- Image loading utility (supports PIL, OpenCV)
- Contrast enhancement methods (histogram equalization, CLAHE)
- Sharpening filter for line enhancement
- Optional adaptive binarization
- Return preprocessed image as numpy array or PIL Image
-
2.2 Implement
enhance_for_layout_detection()function- Input: Original image path or PIL Image
- Output: Preprocessed image (same format as input)
- Steps: contrast → sharpen → (optional) binarize
3. Integration with OCR Service
-
3.1 Update
backend/app/services/ocr_service.py- Import preprocessing service
- Before
_run_ppstructure(), preprocess image if enabled - Pass preprocessed image to PP-Structure for layout detection
- Keep original image reference for image extraction
-
3.2 Ensure image element extraction uses original
- Verify
saved_pathandimg_pathin elements reference original - Bbox coordinates from preprocessed detection applied to original crop
- Verify
4. Testing
-
4.1 Unit tests for preprocessing_service
- Test contrast enhancement methods
- Test sharpening filter
- Test binarization
- Test with various image formats (PNG, JPEG)
-
4.2 Integration tests
- Test OCR track with preprocessing enabled/disabled
- Verify image element quality is preserved
- Test with known problematic documents (faint table borders)
5. Documentation
- 5.1 Update API documentation
- Document new configuration options
- Explain preprocessing behavior