proposal: add-layout-preprocessing for improved table detection
Problem: PP-Structure misses tables with faint lines/borders Solution: Preprocess images (contrast, sharpen) for layout detection - Preprocessed image only used for layout detection - Original image preserved for element extraction (quality) Includes: proposal.md, design.md, tasks.md, spec delta 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
124
openspec/changes/add-layout-preprocessing/design.md
Normal file
124
openspec/changes/add-layout-preprocessing/design.md
Normal file
@@ -0,0 +1,124 @@
|
||||
# Design: Layout Detection Image Preprocessing
|
||||
|
||||
## Context
|
||||
|
||||
PP-StructureV3's layout detection model (PP-DocLayout_plus-L) sometimes fails to detect tables with faint lines or low contrast. This is a preprocessing problem - the model can detect tables when lines are clearly visible, but struggles with poor quality scans or documents with light-colored borders.
|
||||
|
||||
### Current Flow
|
||||
```
|
||||
Original Image → PP-Structure (layout detection) → Element Recognition
|
||||
↓
|
||||
Returns element bboxes
|
||||
↓
|
||||
Image extraction crops from original
|
||||
```
|
||||
|
||||
### Proposed Flow
|
||||
```
|
||||
Original Image → Preprocess → PP-Structure (layout detection) → Element Recognition
|
||||
↓
|
||||
Returns element bboxes
|
||||
↓
|
||||
Original Image ← ← ← ← Image extraction crops from original (NOT preprocessed)
|
||||
```
|
||||
|
||||
## Goals / Non-Goals
|
||||
|
||||
### Goals
|
||||
- Improve table detection for documents with faint lines
|
||||
- Preserve original image quality for element extraction
|
||||
- Make preprocessing configurable (enable/disable, intensity)
|
||||
- Minimal performance impact
|
||||
|
||||
### Non-Goals
|
||||
- Preprocessing for text recognition (Raw OCR handles this separately)
|
||||
- Modifying how PP-Structure internally processes images
|
||||
- General image quality improvement (out of scope)
|
||||
|
||||
## Decisions
|
||||
|
||||
### Decision 1: Preprocess only for layout detection input
|
||||
**Rationale**:
|
||||
- Layout detection needs enhanced edges/contrast to identify regions
|
||||
- Image element extraction needs original quality for output
|
||||
- Raw OCR text recognition works independently and doesn't need preprocessing
|
||||
|
||||
### Decision 2: Use CLAHE (Contrast Limited Adaptive Histogram Equalization) as default
|
||||
**Rationale**:
|
||||
- CLAHE prevents over-amplification in already bright areas
|
||||
- Adaptive nature handles varying background regions
|
||||
- Well-supported by OpenCV
|
||||
|
||||
**Alternatives considered**:
|
||||
- Global histogram equalization: Too aggressive, causes artifacts
|
||||
- Manual brightness/contrast: Not adaptive to document variations
|
||||
|
||||
### Decision 3: Preprocessing is applied in-memory, not saved to disk
|
||||
**Rationale**:
|
||||
- Preprocessed image is only needed during PP-Structure call
|
||||
- Saving would increase storage and I/O overhead
|
||||
- Original image is already saved and used for extraction
|
||||
|
||||
### Decision 4: Sharpening via Unsharp Mask
|
||||
**Rationale**:
|
||||
- Enhances edges without introducing noise
|
||||
- Helps make faint table borders more detectable
|
||||
- Configurable strength
|
||||
|
||||
## Implementation Details
|
||||
|
||||
### Preprocessing Pipeline
|
||||
```python
|
||||
def enhance_for_layout_detection(image: Image.Image, config: Settings) -> Image.Image:
|
||||
"""Enhance image for better layout detection."""
|
||||
|
||||
# Step 1: Contrast enhancement
|
||||
if config.layout_preprocessing_contrast == "clahe":
|
||||
image = apply_clahe(image)
|
||||
elif config.layout_preprocessing_contrast == "histogram":
|
||||
image = apply_histogram_equalization(image)
|
||||
|
||||
# Step 2: Sharpening (optional)
|
||||
if config.layout_preprocessing_sharpen:
|
||||
image = apply_unsharp_mask(image)
|
||||
|
||||
# Step 3: Binarization (optional, aggressive)
|
||||
if config.layout_preprocessing_binarize:
|
||||
image = apply_adaptive_threshold(image)
|
||||
|
||||
return image
|
||||
```
|
||||
|
||||
### Integration Point
|
||||
```python
|
||||
# In ocr_service.py, before calling PP-Structure
|
||||
if settings.layout_preprocessing_enabled:
|
||||
preprocessed_image = enhance_for_layout_detection(page_image, settings)
|
||||
pp_input = preprocessed_image
|
||||
else:
|
||||
pp_input = page_image
|
||||
|
||||
# PP-Structure gets preprocessed (or original if disabled)
|
||||
layout_results = self.structure_engine(pp_input)
|
||||
|
||||
# Image extraction still uses original
|
||||
for element in layout_results:
|
||||
if element.type == "image":
|
||||
crop_image_from_original(page_image, element.bbox) # Use original!
|
||||
```
|
||||
|
||||
## Risks / Trade-offs
|
||||
|
||||
| Risk | Mitigation |
|
||||
|------|------------|
|
||||
| Performance overhead | Preprocessing is fast (~50ms/page), enable/disable option |
|
||||
| Over-enhancement artifacts | CLAHE clip limit prevents over-saturation, configurable |
|
||||
| Memory spike for large images | Process one page at a time, discard preprocessed after use |
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. Should binarization be applied before or after CLAHE?
|
||||
- Current: After (enhances contrast first, then binarize if needed)
|
||||
|
||||
2. Should preprocessing parameters be tunable per-request or only server-wide?
|
||||
- Current: Server-wide config only (simpler)
|
||||
Reference in New Issue
Block a user