OCR/openspec/changes/archive/2025-11-28-unify-image-scaling/proposal.md

# Change: Unify Image Scaling Strategy for Optimal Layout Detection

## Why

Currently, the system has inconsistent image resolution handling:

1. **PDF conversion**: Always uses 300 DPI, producing ~2480×3508 images for A4
2. **Image downscaling**: Only applied when image > 2000px (no upscaling)
3. **Small images**: Never scaled up, even if they're below optimal detection size

This inconsistency causes:
- Wasted processing: PDF→300DPI→scale down to 1600px (double conversion)
- Suboptimal detection: Small images stay small, missing table structures
- Inconsistent behavior: Different source formats get different treatment

PP-Structure's layout detection model (RT-DETR based) works best with images around 1600px on the longest side. Both too-large and too-small images reduce detection accuracy.

## What Changes

- **Bidirectional scaling for PP-Structure**
  - Scale DOWN images larger than max threshold (2000px) → target (1600px)
  - Scale UP images smaller than min threshold (1200px) → target (1600px)
  - No change for images in optimal range (1200-2000px)

- **PDF conversion DPI optimization**
  - Calculate optimal DPI based on target resolution
  - Avoid double-scaling (convert at high DPI then scale down)
  - Option to use adaptive DPI or fixed DPI with post-scaling

- **Unified scaling logic**
  - Same rules apply to all image sources (IMG, PDF pages)
  - Scaling happens once at preprocessing stage
  - Bbox coordinates scaled back to original for accurate cropping

- **Configuration**
  - `layout_image_scaling_min_dimension`: Minimum size before upscaling (default: 1200)
  - Keep existing `layout_image_scaling_max_dimension` (2000) and `target_dimension` (1600)

## Impact

### Affected Specs
- `ocr-processing` - Modified scaling requirements

### Affected Code
- `backend/app/core/config.py` - Add min_dimension setting
- `backend/app/services/layout_preprocessing_service.py` - Add upscaling logic
- `backend/app/services/ocr_service.py` - Optional: Adjust PDF DPI handling

### Quality Impact

| Scenario | Before | After |
|----------|--------|-------|
| Large image (3000px) | Scaled to 1600px | Same |
| Optimal image (1500px) | No scaling | Same |
| Small image (800px) | No scaling | Scaled to 1600px |
| PDF at 300 DPI | 2480px → 1600px | Same (or optimized DPI) |

### Raw OCR Impact
- No change: Raw OCR continues to use original/converted images
- Upscaling only affects PP-Structure layout detection input

## Risks

1. **Upscaling quality**: Enlarging small images may introduce interpolation artifacts
   - Mitigation: Use INTER_CUBIC or INTER_LANCZOS4 for upscaling
   - Note: Layout detection cares about structure, not fine text detail

2. **Memory for large upscaled images**: Small image scaled up uses more memory
   - Mitigation: 800px → 1600px is 4x pixels, but 1600px is still reasonable

3. **Breaking existing behavior**: Users may rely on current behavior
   - Mitigation: Document the change, add config toggle if needed