- Archive unify-image-scaling proposal to archive/2025-11-28 - Create new extract-table-cell-boxes proposal for supplementing PPStructureV3 with direct SLANeXt model calls to extract table cell bounding boxes - Add debug logging to pp_structure_enhanced.py for table cell boxes investigation - Discovered that PPStructureV3 high-level API filters out cell bbox data, but paddlex.create_model() can directly invoke underlying models 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
3.8 KiB
3.8 KiB
Tasks: Unify Image Scaling Strategy
1. Configuration
- 1.1 Add min_dimension setting to
backend/app/core/config.pylayout_image_scaling_min_dimension: int = 1200- Description: "Min dimension (pixels) before upscaling. Images smaller than this will be scaled up."
2. Bidirectional Scaling Logic
-
2.1 Update
scale_for_layout_detection()inlayout_preprocessing_service.py- Add upscaling condition:
max_dim < min_dimension - Use
cv2.INTER_CUBICfor upscaling (better quality than INTER_LINEAR) - Update docstring to reflect bidirectional behavior
- Add upscaling condition:
-
2.2 Update scaling decision logic
# Current: only downscale should_scale = max_dim > max_dimension # New: bidirectional should_downscale = max_dim > max_dimension should_upscale = max_dim < min_dimension should_scale = should_downscale or should_upscale -
2.3 Update logging to indicate scale direction
- "Scaled DOWN for layout detection: 2480x3508 -> 1131x1600"
- "Scaled UP for layout detection: 800x600 -> 1600x1200"
3. PDF DPI Handling (Optional Optimization)
-
3.1 Evaluate current PDF conversion impact
- Decision: Keep 300 DPI, let bidirectional scaling handle it
- Reason: Raw OCR benefits from high resolution, scaling handles PP-Structure needs
-
3.2 Option A: Keep 300 DPI, let scaling handle it ✓
- Simplest approach, no change needed
- Raw OCR benefits from high resolution
-
3.3 Option B: Add configurable PDF DPI(Not needed)
4. Testing
-
4.1 Test upscaling with small images
- Small image (800x600): Scaled UP → 1600x1200, scale_factor=0.500
- Very small (400x300): Scaled UP → 1600x1200, scale_factor=0.250
-
4.2 Test no scaling for optimal range
- Optimal image (1500x1000): was_scaled=False, scale_factor=1.000
-
4.3 Test downscaling (existing behavior)
- Large image (2480x3508): Scaled DOWN → 1131x1600, scale_factor=2.192
-
4.4 Test PDF workflow (manual test recommended)
- PDF page should be detected correctly
- Scaling should apply after PDF conversion
5. Documentation
-
5.1 Update config.py Field descriptions
- Explained bidirectional scaling in enabled field description
- Updated max/min/target descriptions
-
5.2 Add logging for scaling decisions
- Logs direction (UP/DOWN), original size, target size, scale_factor
Implementation Summary
Files Modified:
backend/app/core/config.py- Addedlayout_image_scaling_min_dimensionsettingbackend/app/services/layout_preprocessing_service.py- Updated bidirectional scaling logic
Test Results (2025-11-27):
| Test Case | Original | Result | scale_factor |
|---|---|---|---|
| Small (800×600) | max=800 < 1200 | UP → 1600×1200 | 0.500 |
| Optimal (1500×1000) | 1200 ≤ 1500 ≤ 2000 | No scaling | 1.000 |
| Large (2480×3508) | max=3508 > 2000 | DOWN → 1131×1600 | 2.192 |
| Very small (400×300) | max=400 < 1200 | UP → 1600×1200 | 0.250 |
Implementation Notes
Scaling Decision Matrix
| Image Size | Action | Scale Factor | Interpolation |
|---|---|---|---|
| < 1200px | Scale UP | target/max_dim | INTER_CUBIC |
| 1200-2000px | No scaling | 1.0 | N/A |
| > 2000px | Scale DOWN | target/max_dim | INTER_AREA |
Example Scenarios
-
Small scan (800×600)
- max_dim = 800 < 1200 → Scale UP
- target = 1600, scale = 1600/800 = 2.0
- Result: 1600×1200
- scale_factor (for bbox restore) = 0.5
-
Optimal image (1400×1000)
- max_dim = 1400, 1200 <= 1400 <= 2000 → No scaling
- Result: unchanged
- scale_factor = 1.0
-
High-res scan (2480×3508)
- max_dim = 3508 > 2000 → Scale DOWN
- target = 1600, scale = 1600/3508 = 0.456
- Result: 1131×1600
- scale_factor (for bbox restore) = 2.19