# Change: Unify Image Scaling Strategy for Optimal Layout Detection ## Why Currently, the system has inconsistent image resolution handling: 1. **PDF conversion**: Always uses 300 DPI, producing ~2480×3508 images for A4 2. **Image downscaling**: Only applied when image > 2000px (no upscaling) 3. **Small images**: Never scaled up, even if they're below optimal detection size This inconsistency causes: - Wasted processing: PDF→300DPI→scale down to 1600px (double conversion) - Suboptimal detection: Small images stay small, missing table structures - Inconsistent behavior: Different source formats get different treatment PP-Structure's layout detection model (RT-DETR based) works best with images around 1600px on the longest side. Both too-large and too-small images reduce detection accuracy. ## What Changes - **Bidirectional scaling for PP-Structure** - Scale DOWN images larger than max threshold (2000px) → target (1600px) - Scale UP images smaller than min threshold (1200px) → target (1600px) - No change for images in optimal range (1200-2000px) - **PDF conversion DPI optimization** - Calculate optimal DPI based on target resolution - Avoid double-scaling (convert at high DPI then scale down) - Option to use adaptive DPI or fixed DPI with post-scaling - **Unified scaling logic** - Same rules apply to all image sources (IMG, PDF pages) - Scaling happens once at preprocessing stage - Bbox coordinates scaled back to original for accurate cropping - **Configuration** - `layout_image_scaling_min_dimension`: Minimum size before upscaling (default: 1200) - Keep existing `layout_image_scaling_max_dimension` (2000) and `target_dimension` (1600) ## Impact ### Affected Specs - `ocr-processing` - Modified scaling requirements ### Affected Code - `backend/app/core/config.py` - Add min_dimension setting - `backend/app/services/layout_preprocessing_service.py` - Add upscaling logic - `backend/app/services/ocr_service.py` - Optional: Adjust PDF DPI handling ### Quality Impact | Scenario | Before | After | |----------|--------|-------| | Large image (3000px) | Scaled to 1600px | Same | | Optimal image (1500px) | No scaling | Same | | Small image (800px) | No scaling | Scaled to 1600px | | PDF at 300 DPI | 2480px → 1600px | Same (or optimized DPI) | ### Raw OCR Impact - No change: Raw OCR continues to use original/converted images - Upscaling only affects PP-Structure layout detection input ## Risks 1. **Upscaling quality**: Enlarging small images may introduce interpolation artifacts - Mitigation: Use INTER_CUBIC or INTER_LANCZOS4 for upscaling - Note: Layout detection cares about structure, not fine text detail 2. **Memory for large upscaled images**: Small image scaled up uses more memory - Mitigation: 800px → 1600px is 4x pixels, but 1600px is still reasonable 3. **Breaking existing behavior**: Users may rely on current behavior - Mitigation: Document the change, add config toggle if needed