Files
OCR/openspec/changes/archive/2025-11-28-unify-image-scaling/tasks.md
egg 801ee9c4b6 feat: create extract-table-cell-boxes proposal and archive old proposal
- Archive unify-image-scaling proposal to archive/2025-11-28
- Create new extract-table-cell-boxes proposal for supplementing PPStructureV3
  with direct SLANeXt model calls to extract table cell bounding boxes
- Add debug logging to pp_structure_enhanced.py for table cell boxes investigation
- Discovered that PPStructureV3 high-level API filters out cell bbox data,
  but paddlex.create_model() can directly invoke underlying models

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 12:15:06 +08:00

3.8 KiB
Raw Blame History

Tasks: Unify Image Scaling Strategy

1. Configuration

  • 1.1 Add min_dimension setting to backend/app/core/config.py
    • layout_image_scaling_min_dimension: int = 1200
    • Description: "Min dimension (pixels) before upscaling. Images smaller than this will be scaled up."

2. Bidirectional Scaling Logic

  • 2.1 Update scale_for_layout_detection() in layout_preprocessing_service.py

    • Add upscaling condition: max_dim < min_dimension
    • Use cv2.INTER_CUBIC for upscaling (better quality than INTER_LINEAR)
    • Update docstring to reflect bidirectional behavior
  • 2.2 Update scaling decision logic

    # Current: only downscale
    should_scale = max_dim > max_dimension
    
    # New: bidirectional
    should_downscale = max_dim > max_dimension
    should_upscale = max_dim < min_dimension
    should_scale = should_downscale or should_upscale
    
  • 2.3 Update logging to indicate scale direction

    • "Scaled DOWN for layout detection: 2480x3508 -> 1131x1600"
    • "Scaled UP for layout detection: 800x600 -> 1600x1200"

3. PDF DPI Handling (Optional Optimization)

  • 3.1 Evaluate current PDF conversion impact

    • Decision: Keep 300 DPI, let bidirectional scaling handle it
    • Reason: Raw OCR benefits from high resolution, scaling handles PP-Structure needs
  • 3.2 Option A: Keep 300 DPI, let scaling handle it ✓

    • Simplest approach, no change needed
    • Raw OCR benefits from high resolution
  • 3.3 Option B: Add configurable PDF DPI (Not needed)

4. Testing

  • 4.1 Test upscaling with small images

    • Small image (800x600): Scaled UP → 1600x1200, scale_factor=0.500
    • Very small (400x300): Scaled UP → 1600x1200, scale_factor=0.250
  • 4.2 Test no scaling for optimal range

    • Optimal image (1500x1000): was_scaled=False, scale_factor=1.000
  • 4.3 Test downscaling (existing behavior)

    • Large image (2480x3508): Scaled DOWN → 1131x1600, scale_factor=2.192
  • 4.4 Test PDF workflow (manual test recommended)

    • PDF page should be detected correctly
    • Scaling should apply after PDF conversion

5. Documentation

  • 5.1 Update config.py Field descriptions

    • Explained bidirectional scaling in enabled field description
    • Updated max/min/target descriptions
  • 5.2 Add logging for scaling decisions

    • Logs direction (UP/DOWN), original size, target size, scale_factor

Implementation Summary

Files Modified:

  • backend/app/core/config.py - Added layout_image_scaling_min_dimension setting
  • backend/app/services/layout_preprocessing_service.py - Updated bidirectional scaling logic

Test Results (2025-11-27):

Test Case Original Result scale_factor
Small (800×600) max=800 < 1200 UP → 1600×1200 0.500
Optimal (1500×1000) 1200 ≤ 1500 ≤ 2000 No scaling 1.000
Large (2480×3508) max=3508 > 2000 DOWN → 1131×1600 2.192
Very small (400×300) max=400 < 1200 UP → 1600×1200 0.250

Implementation Notes

Scaling Decision Matrix

Image Size Action Scale Factor Interpolation
< 1200px Scale UP target/max_dim INTER_CUBIC
1200-2000px No scaling 1.0 N/A
> 2000px Scale DOWN target/max_dim INTER_AREA

Example Scenarios

  1. Small scan (800×600)

    • max_dim = 800 < 1200 → Scale UP
    • target = 1600, scale = 1600/800 = 2.0
    • Result: 1600×1200
    • scale_factor (for bbox restore) = 0.5
  2. Optimal image (1400×1000)

    • max_dim = 1400, 1200 <= 1400 <= 2000 → No scaling
    • Result: unchanged
    • scale_factor = 1.0
  3. High-res scan (2480×3508)

    • max_dim = 3508 > 2000 → Scale DOWN
    • target = 1600, scale = 1600/3508 = 0.456
    • Result: 1131×1600
    • scale_factor (for bbox restore) = 2.19