# Design: Layout Detection Image Preprocessing ## Context PP-StructureV3's layout detection model (PP-DocLayout_plus-L) sometimes fails to detect tables with faint lines or low contrast. This is a preprocessing problem - the model can detect tables when lines are clearly visible, but struggles with poor quality scans or documents with light-colored borders. ### Current Flow ``` Original Image → PP-Structure (layout detection) → Element Recognition ↓ Returns element bboxes ↓ Image extraction crops from original ``` ### Proposed Flow ``` Original Image → Preprocess → PP-Structure (layout detection) → Element Recognition ↓ Returns element bboxes ↓ Original Image ← ← ← ← Image extraction crops from original (NOT preprocessed) ``` ## Goals / Non-Goals ### Goals - Improve table detection for documents with faint lines - Preserve original image quality for element extraction - **Hybrid control**: Auto mode by default, manual override available - **Preview capability**: Users can verify preprocessing before processing - Minimal performance impact ### Non-Goals - Preprocessing for text recognition (Raw OCR handles this separately) - Modifying how PP-Structure internally processes images - General image quality improvement (out of scope) - Real-time preview during processing (preview is pre-processing only) ## Decisions ### Decision 1: Preprocess only for layout detection input **Rationale**: - Layout detection needs enhanced edges/contrast to identify regions - Image element extraction needs original quality for output - Raw OCR text recognition works independently and doesn't need preprocessing ### Decision 2: Use CLAHE (Contrast Limited Adaptive Histogram Equalization) as default **Rationale**: - CLAHE prevents over-amplification in already bright areas - Adaptive nature handles varying background regions - Well-supported by OpenCV **Alternatives considered**: - Global histogram equalization: Too aggressive, causes artifacts - Manual brightness/contrast: Not adaptive to document variations ### Decision 3: Preprocessing is applied in-memory, not saved to disk **Rationale**: - Preprocessed image is only needed during PP-Structure call - Saving would increase storage and I/O overhead - Original image is already saved and used for extraction ### Decision 4: Sharpening via Unsharp Mask **Rationale**: - Enhances edges without introducing noise - Helps make faint table borders more detectable - Configurable strength ### Decision 5: Hybrid Control Mode (Auto + Manual) **Rationale**: - Auto mode provides seamless experience for most users - Manual mode gives power users fine control - Preview allows verification before committing to processing **Auto-detection algorithm**: ```python def analyze_image_quality(image: np.ndarray) -> dict: gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # Contrast: standard deviation of pixel values contrast = np.std(gray) # Edge strength: mean of Sobel gradient magnitude sobel_x = cv2.Sobel(gray, cv2.CV_64F, 1, 0, ksize=3) sobel_y = cv2.Sobel(gray, cv2.CV_64F, 0, 1, ksize=3) edge_strength = np.mean(np.sqrt(sobel_x**2 + sobel_y**2)) return { "contrast": contrast, "edge_strength": edge_strength, "recommended": { "contrast": "clahe" if contrast < 40 else "none", "sharpen": edge_strength < 15, "binarize": contrast < 20 } } ``` ### Decision 6: Preview API Design **Rationale**: - Users should see preprocessing effect before full processing - Reduces trial-and-error cycles - Builds user confidence in the system **API Design**: ``` POST /api/v2/tasks/{task_id}/preview/preprocessing Request: { "page": 1, "mode": "auto", // or "manual" "config": { // only for manual mode "contrast": "clahe", "sharpen": true, "binarize": false } } Response: { "original_url": "/api/v2/tasks/{id}/pages/1/image", "preprocessed_url": "/api/v2/tasks/{id}/pages/1/image?preprocessed=true", "quality_metrics": { "contrast": 35.2, "edge_strength": 12.8 }, "auto_config": { "contrast": "clahe", "sharpen": true, "binarize": false } } ``` ## Implementation Details ### Preprocessing Pipeline ```python def enhance_for_layout_detection(image: Image.Image, config: Settings) -> Image.Image: """Enhance image for better layout detection.""" # Step 1: Contrast enhancement if config.layout_preprocessing_contrast == "clahe": image = apply_clahe(image) elif config.layout_preprocessing_contrast == "histogram": image = apply_histogram_equalization(image) # Step 2: Sharpening (optional) if config.layout_preprocessing_sharpen: image = apply_unsharp_mask(image) # Step 3: Binarization (optional, aggressive) if config.layout_preprocessing_binarize: image = apply_adaptive_threshold(image) return image ``` ### Integration Point ```python # In ocr_service.py, before calling PP-Structure if settings.layout_preprocessing_enabled: preprocessed_image = enhance_for_layout_detection(page_image, settings) pp_input = preprocessed_image else: pp_input = page_image # PP-Structure gets preprocessed (or original if disabled) layout_results = self.structure_engine(pp_input) # Image extraction still uses original for element in layout_results: if element.type == "image": crop_image_from_original(page_image, element.bbox) # Use original! ``` ## Risks / Trade-offs | Risk | Mitigation | |------|------------| | Performance overhead | Preprocessing is fast (~50ms/page), enable/disable option | | Over-enhancement artifacts | CLAHE clip limit prevents over-saturation, configurable | | Memory spike for large images | Process one page at a time, discard preprocessed after use | ## Open Questions 1. Should binarization be applied before or after CLAHE? - Current: After (enhances contrast first, then binarize if needed) 2. Should preprocessing parameters be tunable per-request or only server-wide? - Current: Server-wide config only (simpler)