proposal: add-layout-preprocessing for improved table detection

Problem: PP-Structure misses tables with faint lines/borders Solution: Preprocess images (contrast, sharpen) for layout detection - Preprocessed image only used for layout detection - Original image preserved for element extraction (quality) Includes: proposal.md, design.md, tasks.md, spec delta 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-27 14:24:23 +08:00
parent 5448a047ff
commit c12ea0b9f6
4 changed files with 295 additions and 0 deletions
--- a/openspec/changes/add-layout-preprocessing/design.md
+++ b/openspec/changes/add-layout-preprocessing/design.md
@@ -0,0 +1,124 @@
+# Design: Layout Detection Image Preprocessing
+
+## Context
+
+PP-StructureV3's layout detection model (PP-DocLayout_plus-L) sometimes fails to detect tables with faint lines or low contrast. This is a preprocessing problem - the model can detect tables when lines are clearly visible, but struggles with poor quality scans or documents with light-colored borders.
+
+### Current Flow
+```
+Original Image → PP-Structure (layout detection) → Element Recognition
+                      ↓
+              Returns element bboxes
+                      ↓
+              Image extraction crops from original
+```
+
+### Proposed Flow
+```
+Original Image → Preprocess → PP-Structure (layout detection) → Element Recognition
+                                      ↓
+                              Returns element bboxes
+                                      ↓
+Original Image ← ← ← ← Image extraction crops from original (NOT preprocessed)
+```
+
+## Goals / Non-Goals
+
+### Goals
+- Improve table detection for documents with faint lines
+- Preserve original image quality for element extraction
+- Make preprocessing configurable (enable/disable, intensity)
+- Minimal performance impact
+
+### Non-Goals
+- Preprocessing for text recognition (Raw OCR handles this separately)
+- Modifying how PP-Structure internally processes images
+- General image quality improvement (out of scope)
+
+## Decisions
+
+### Decision 1: Preprocess only for layout detection input
+**Rationale**:
+- Layout detection needs enhanced edges/contrast to identify regions
+- Image element extraction needs original quality for output
+- Raw OCR text recognition works independently and doesn't need preprocessing
+
+### Decision 2: Use CLAHE (Contrast Limited Adaptive Histogram Equalization) as default
+**Rationale**:
+- CLAHE prevents over-amplification in already bright areas
+- Adaptive nature handles varying background regions
+- Well-supported by OpenCV
+
+**Alternatives considered**:
+- Global histogram equalization: Too aggressive, causes artifacts
+- Manual brightness/contrast: Not adaptive to document variations
+
+### Decision 3: Preprocessing is applied in-memory, not saved to disk
+**Rationale**:
+- Preprocessed image is only needed during PP-Structure call
+- Saving would increase storage and I/O overhead
+- Original image is already saved and used for extraction
+
+### Decision 4: Sharpening via Unsharp Mask
+**Rationale**:
+- Enhances edges without introducing noise
+- Helps make faint table borders more detectable
+- Configurable strength
+
+## Implementation Details
+
+### Preprocessing Pipeline
+```python
+def enhance_for_layout_detection(image: Image.Image, config: Settings) -> Image.Image:
+    """Enhance image for better layout detection."""
+
+    # Step 1: Contrast enhancement
+    if config.layout_preprocessing_contrast == "clahe":
+        image = apply_clahe(image)
+    elif config.layout_preprocessing_contrast == "histogram":
+        image = apply_histogram_equalization(image)
+
+    # Step 2: Sharpening (optional)
+    if config.layout_preprocessing_sharpen:
+        image = apply_unsharp_mask(image)
+
+    # Step 3: Binarization (optional, aggressive)
+    if config.layout_preprocessing_binarize:
+        image = apply_adaptive_threshold(image)
+
+    return image
+```
+
+### Integration Point
+```python
+# In ocr_service.py, before calling PP-Structure
+if settings.layout_preprocessing_enabled:
+    preprocessed_image = enhance_for_layout_detection(page_image, settings)
+    pp_input = preprocessed_image
+else:
+    pp_input = page_image
+
+# PP-Structure gets preprocessed (or original if disabled)
+layout_results = self.structure_engine(pp_input)
+
+# Image extraction still uses original
+for element in layout_results:
+    if element.type == "image":
+        crop_image_from_original(page_image, element.bbox)  # Use original!
+```
+
+## Risks / Trade-offs
+
+| Risk | Mitigation |
+|------|------------|
+| Performance overhead | Preprocessing is fast (~50ms/page), enable/disable option |
+| Over-enhancement artifacts | CLAHE clip limit prevents over-saturation, configurable |
+| Memory spike for large images | Process one page at a time, discard preprocessed after use |
+
+## Open Questions
+
+1. Should binarization be applied before or after CLAHE?
+   - Current: After (enhances contrast first, then binarize if needed)
+
+2. Should preprocessing parameters be tunable per-request or only server-wide?
+   - Current: Server-wide config only (simpler)