Files
OCR/openspec/changes/archive/2025-11-28-unify-image-scaling/tasks.md
egg 801ee9c4b6 feat: create extract-table-cell-boxes proposal and archive old proposal
- Archive unify-image-scaling proposal to archive/2025-11-28
- Create new extract-table-cell-boxes proposal for supplementing PPStructureV3
  with direct SLANeXt model calls to extract table cell bounding boxes
- Add debug logging to pp_structure_enhanced.py for table cell boxes investigation
- Discovered that PPStructureV3 high-level API filters out cell bbox data,
  but paddlex.create_model() can directly invoke underlying models

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 12:15:06 +08:00

114 lines
3.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Tasks: Unify Image Scaling Strategy
## 1. Configuration
- [x] 1.1 Add min_dimension setting to `backend/app/core/config.py`
- `layout_image_scaling_min_dimension: int = 1200`
- Description: "Min dimension (pixels) before upscaling. Images smaller than this will be scaled up."
## 2. Bidirectional Scaling Logic
- [x] 2.1 Update `scale_for_layout_detection()` in `layout_preprocessing_service.py`
- Add upscaling condition: `max_dim < min_dimension`
- Use `cv2.INTER_CUBIC` for upscaling (better quality than INTER_LINEAR)
- Update docstring to reflect bidirectional behavior
- [x] 2.2 Update scaling decision logic
```python
# Current: only downscale
should_scale = max_dim > max_dimension
# New: bidirectional
should_downscale = max_dim > max_dimension
should_upscale = max_dim < min_dimension
should_scale = should_downscale or should_upscale
```
- [x] 2.3 Update logging to indicate scale direction
- "Scaled DOWN for layout detection: 2480x3508 -> 1131x1600"
- "Scaled UP for layout detection: 800x600 -> 1600x1200"
## 3. PDF DPI Handling (Optional Optimization)
- [x] 3.1 Evaluate current PDF conversion impact
- Decision: Keep 300 DPI, let bidirectional scaling handle it
- Reason: Raw OCR benefits from high resolution, scaling handles PP-Structure needs
- [x] 3.2 Option A: Keep 300 DPI, let scaling handle it ✓
- Simplest approach, no change needed
- Raw OCR benefits from high resolution
- [ ] ~~3.3 Option B: Add configurable PDF DPI~~ (Not needed)
## 4. Testing
- [x] 4.1 Test upscaling with small images
- Small image (800x600): Scaled UP → 1600x1200, scale_factor=0.500
- Very small (400x300): Scaled UP → 1600x1200, scale_factor=0.250
- [x] 4.2 Test no scaling for optimal range
- Optimal image (1500x1000): was_scaled=False, scale_factor=1.000
- [x] 4.3 Test downscaling (existing behavior)
- Large image (2480x3508): Scaled DOWN → 1131x1600, scale_factor=2.192
- [ ] 4.4 Test PDF workflow (manual test recommended)
- PDF page should be detected correctly
- Scaling should apply after PDF conversion
## 5. Documentation
- [x] 5.1 Update config.py Field descriptions
- Explained bidirectional scaling in enabled field description
- Updated max/min/target descriptions
- [x] 5.2 Add logging for scaling decisions
- Logs direction (UP/DOWN), original size, target size, scale_factor
---
## Implementation Summary
**Files Modified:**
- `backend/app/core/config.py` - Added `layout_image_scaling_min_dimension` setting
- `backend/app/services/layout_preprocessing_service.py` - Updated bidirectional scaling logic
**Test Results (2025-11-27):**
| Test Case | Original | Result | scale_factor |
|-----------|----------|--------|--------------|
| Small (800×600) | max=800 < 1200 | UP 1600×1200 | 0.500 |
| Optimal (1500×1000) | 1200 1500 2000 | No scaling | 1.000 |
| Large (2480×3508) | max=3508 > 2000 | DOWN → 1131×1600 | 2.192 |
| Very small (400×300) | max=400 < 1200 | UP 1600×1200 | 0.250 |
---
## Implementation Notes
### Scaling Decision Matrix
| Image Size | Action | Scale Factor | Interpolation |
|------------|--------|--------------|---------------|
| < 1200px | Scale UP | target/max_dim | INTER_CUBIC |
| 1200-2000px | No scaling | 1.0 | N/A |
| > 2000px | Scale DOWN | target/max_dim | INTER_AREA |
### Example Scenarios
1. **Small scan (800×600)**
- max_dim = 800 < 1200 Scale UP
- target = 1600, scale = 1600/800 = 2.0
- Result: 1600×1200
- scale_factor (for bbox restore) = 0.5
2. **Optimal image (1400×1000)**
- max_dim = 1400, 1200 <= 1400 <= 2000 No scaling
- Result: unchanged
- scale_factor = 1.0
3. **High-res scan (2480×3508)**
- max_dim = 3508 > 2000 → Scale DOWN
- target = 1600, scale = 1600/3508 = 0.456
- Result: 1131×1600
- scale_factor (for bbox restore) = 2.19