Files
OCR/openspec/changes/archive/2025-11-28-unify-image-scaling/proposal.md
egg 801ee9c4b6 feat: create extract-table-cell-boxes proposal and archive old proposal
- Archive unify-image-scaling proposal to archive/2025-11-28
- Create new extract-table-cell-boxes proposal for supplementing PPStructureV3
  with direct SLANeXt model calls to extract table cell bounding boxes
- Add debug logging to pp_structure_enhanced.py for table cell boxes investigation
- Discovered that PPStructureV3 high-level API filters out cell bbox data,
  but paddlex.create_model() can directly invoke underlying models

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 12:15:06 +08:00

73 lines
2.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Change: Unify Image Scaling Strategy for Optimal Layout Detection
## Why
Currently, the system has inconsistent image resolution handling:
1. **PDF conversion**: Always uses 300 DPI, producing ~2480×3508 images for A4
2. **Image downscaling**: Only applied when image > 2000px (no upscaling)
3. **Small images**: Never scaled up, even if they're below optimal detection size
This inconsistency causes:
- Wasted processing: PDF→300DPI→scale down to 1600px (double conversion)
- Suboptimal detection: Small images stay small, missing table structures
- Inconsistent behavior: Different source formats get different treatment
PP-Structure's layout detection model (RT-DETR based) works best with images around 1600px on the longest side. Both too-large and too-small images reduce detection accuracy.
## What Changes
- **Bidirectional scaling for PP-Structure**
- Scale DOWN images larger than max threshold (2000px) → target (1600px)
- Scale UP images smaller than min threshold (1200px) → target (1600px)
- No change for images in optimal range (1200-2000px)
- **PDF conversion DPI optimization**
- Calculate optimal DPI based on target resolution
- Avoid double-scaling (convert at high DPI then scale down)
- Option to use adaptive DPI or fixed DPI with post-scaling
- **Unified scaling logic**
- Same rules apply to all image sources (IMG, PDF pages)
- Scaling happens once at preprocessing stage
- Bbox coordinates scaled back to original for accurate cropping
- **Configuration**
- `layout_image_scaling_min_dimension`: Minimum size before upscaling (default: 1200)
- Keep existing `layout_image_scaling_max_dimension` (2000) and `target_dimension` (1600)
## Impact
### Affected Specs
- `ocr-processing` - Modified scaling requirements
### Affected Code
- `backend/app/core/config.py` - Add min_dimension setting
- `backend/app/services/layout_preprocessing_service.py` - Add upscaling logic
- `backend/app/services/ocr_service.py` - Optional: Adjust PDF DPI handling
### Quality Impact
| Scenario | Before | After |
|----------|--------|-------|
| Large image (3000px) | Scaled to 1600px | Same |
| Optimal image (1500px) | No scaling | Same |
| Small image (800px) | No scaling | Scaled to 1600px |
| PDF at 300 DPI | 2480px → 1600px | Same (or optimized DPI) |
### Raw OCR Impact
- No change: Raw OCR continues to use original/converted images
- Upscaling only affects PP-Structure layout detection input
## Risks
1. **Upscaling quality**: Enlarging small images may introduce interpolation artifacts
- Mitigation: Use INTER_CUBIC or INTER_LANCZOS4 for upscaling
- Note: Layout detection cares about structure, not fine text detail
2. **Memory for large upscaled images**: Small image scaled up uses more memory
- Mitigation: 800px → 1600px is 4x pixels, but 1600px is still reasonable
3. **Breaking existing behavior**: Users may rely on current behavior
- Mitigation: Document the change, add config toggle if needed