Backend changes: - Add image scaling configuration for PP-Structure processing - Enhance layout preprocessing service with scaling support - Update OCR service with improved memory management - Add PP-Structure enhanced processing improvements Frontend changes: - Update preprocessing settings UI - Fix processing page layout and state management - Update API types for new parameters Proposals: - Archive add-layout-preprocessing proposal (completed) - Add unify-image-scaling proposal for consistent coordinate handling 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
73 lines
2.9 KiB
Markdown
73 lines
2.9 KiB
Markdown
# Change: Unify Image Scaling Strategy for Optimal Layout Detection
|
||
|
||
## Why
|
||
|
||
Currently, the system has inconsistent image resolution handling:
|
||
|
||
1. **PDF conversion**: Always uses 300 DPI, producing ~2480×3508 images for A4
|
||
2. **Image downscaling**: Only applied when image > 2000px (no upscaling)
|
||
3. **Small images**: Never scaled up, even if they're below optimal detection size
|
||
|
||
This inconsistency causes:
|
||
- Wasted processing: PDF→300DPI→scale down to 1600px (double conversion)
|
||
- Suboptimal detection: Small images stay small, missing table structures
|
||
- Inconsistent behavior: Different source formats get different treatment
|
||
|
||
PP-Structure's layout detection model (RT-DETR based) works best with images around 1600px on the longest side. Both too-large and too-small images reduce detection accuracy.
|
||
|
||
## What Changes
|
||
|
||
- **Bidirectional scaling for PP-Structure**
|
||
- Scale DOWN images larger than max threshold (2000px) → target (1600px)
|
||
- Scale UP images smaller than min threshold (1200px) → target (1600px)
|
||
- No change for images in optimal range (1200-2000px)
|
||
|
||
- **PDF conversion DPI optimization**
|
||
- Calculate optimal DPI based on target resolution
|
||
- Avoid double-scaling (convert at high DPI then scale down)
|
||
- Option to use adaptive DPI or fixed DPI with post-scaling
|
||
|
||
- **Unified scaling logic**
|
||
- Same rules apply to all image sources (IMG, PDF pages)
|
||
- Scaling happens once at preprocessing stage
|
||
- Bbox coordinates scaled back to original for accurate cropping
|
||
|
||
- **Configuration**
|
||
- `layout_image_scaling_min_dimension`: Minimum size before upscaling (default: 1200)
|
||
- Keep existing `layout_image_scaling_max_dimension` (2000) and `target_dimension` (1600)
|
||
|
||
## Impact
|
||
|
||
### Affected Specs
|
||
- `ocr-processing` - Modified scaling requirements
|
||
|
||
### Affected Code
|
||
- `backend/app/core/config.py` - Add min_dimension setting
|
||
- `backend/app/services/layout_preprocessing_service.py` - Add upscaling logic
|
||
- `backend/app/services/ocr_service.py` - Optional: Adjust PDF DPI handling
|
||
|
||
### Quality Impact
|
||
|
||
| Scenario | Before | After |
|
||
|----------|--------|-------|
|
||
| Large image (3000px) | Scaled to 1600px | Same |
|
||
| Optimal image (1500px) | No scaling | Same |
|
||
| Small image (800px) | No scaling | Scaled to 1600px |
|
||
| PDF at 300 DPI | 2480px → 1600px | Same (or optimized DPI) |
|
||
|
||
### Raw OCR Impact
|
||
- No change: Raw OCR continues to use original/converted images
|
||
- Upscaling only affects PP-Structure layout detection input
|
||
|
||
## Risks
|
||
|
||
1. **Upscaling quality**: Enlarging small images may introduce interpolation artifacts
|
||
- Mitigation: Use INTER_CUBIC or INTER_LANCZOS4 for upscaling
|
||
- Note: Layout detection cares about structure, not fine text detail
|
||
|
||
2. **Memory for large upscaled images**: Small image scaled up uses more memory
|
||
- Mitigation: 800px → 1600px is 4x pixels, but 1600px is still reasonable
|
||
|
||
3. **Breaking existing behavior**: Users may rely on current behavior
|
||
- Mitigation: Document the change, add config toggle if needed
|