feat: enhance layout preprocessing and unify image scaling proposal
Backend changes: - Add image scaling configuration for PP-Structure processing - Enhance layout preprocessing service with scaling support - Update OCR service with improved memory management - Add PP-Structure enhanced processing improvements Frontend changes: - Update preprocessing settings UI - Fix processing page layout and state management - Update API types for new parameters Proposals: - Archive add-layout-preprocessing proposal (completed) - Add unify-image-scaling proposal for consistent coordinate handling 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
113
openspec/changes/unify-image-scaling/tasks.md
Normal file
113
openspec/changes/unify-image-scaling/tasks.md
Normal file
@@ -0,0 +1,113 @@
|
||||
# Tasks: Unify Image Scaling Strategy
|
||||
|
||||
## 1. Configuration
|
||||
|
||||
- [x] 1.1 Add min_dimension setting to `backend/app/core/config.py`
|
||||
- `layout_image_scaling_min_dimension: int = 1200`
|
||||
- Description: "Min dimension (pixels) before upscaling. Images smaller than this will be scaled up."
|
||||
|
||||
## 2. Bidirectional Scaling Logic
|
||||
|
||||
- [x] 2.1 Update `scale_for_layout_detection()` in `layout_preprocessing_service.py`
|
||||
- Add upscaling condition: `max_dim < min_dimension`
|
||||
- Use `cv2.INTER_CUBIC` for upscaling (better quality than INTER_LINEAR)
|
||||
- Update docstring to reflect bidirectional behavior
|
||||
|
||||
- [x] 2.2 Update scaling decision logic
|
||||
```python
|
||||
# Current: only downscale
|
||||
should_scale = max_dim > max_dimension
|
||||
|
||||
# New: bidirectional
|
||||
should_downscale = max_dim > max_dimension
|
||||
should_upscale = max_dim < min_dimension
|
||||
should_scale = should_downscale or should_upscale
|
||||
```
|
||||
|
||||
- [x] 2.3 Update logging to indicate scale direction
|
||||
- "Scaled DOWN for layout detection: 2480x3508 -> 1131x1600"
|
||||
- "Scaled UP for layout detection: 800x600 -> 1600x1200"
|
||||
|
||||
## 3. PDF DPI Handling (Optional Optimization)
|
||||
|
||||
- [x] 3.1 Evaluate current PDF conversion impact
|
||||
- Decision: Keep 300 DPI, let bidirectional scaling handle it
|
||||
- Reason: Raw OCR benefits from high resolution, scaling handles PP-Structure needs
|
||||
|
||||
- [x] 3.2 Option A: Keep 300 DPI, let scaling handle it ✓
|
||||
- Simplest approach, no change needed
|
||||
- Raw OCR benefits from high resolution
|
||||
|
||||
- [ ] ~~3.3 Option B: Add configurable PDF DPI~~ (Not needed)
|
||||
|
||||
## 4. Testing
|
||||
|
||||
- [x] 4.1 Test upscaling with small images
|
||||
- Small image (800x600): Scaled UP → 1600x1200, scale_factor=0.500
|
||||
- Very small (400x300): Scaled UP → 1600x1200, scale_factor=0.250
|
||||
|
||||
- [x] 4.2 Test no scaling for optimal range
|
||||
- Optimal image (1500x1000): was_scaled=False, scale_factor=1.000
|
||||
|
||||
- [x] 4.3 Test downscaling (existing behavior)
|
||||
- Large image (2480x3508): Scaled DOWN → 1131x1600, scale_factor=2.192
|
||||
|
||||
- [ ] 4.4 Test PDF workflow (manual test recommended)
|
||||
- PDF page should be detected correctly
|
||||
- Scaling should apply after PDF conversion
|
||||
|
||||
## 5. Documentation
|
||||
|
||||
- [x] 5.1 Update config.py Field descriptions
|
||||
- Explained bidirectional scaling in enabled field description
|
||||
- Updated max/min/target descriptions
|
||||
|
||||
- [x] 5.2 Add logging for scaling decisions
|
||||
- Logs direction (UP/DOWN), original size, target size, scale_factor
|
||||
|
||||
---
|
||||
|
||||
## Implementation Summary
|
||||
|
||||
**Files Modified:**
|
||||
- `backend/app/core/config.py` - Added `layout_image_scaling_min_dimension` setting
|
||||
- `backend/app/services/layout_preprocessing_service.py` - Updated bidirectional scaling logic
|
||||
|
||||
**Test Results (2025-11-27):**
|
||||
| Test Case | Original | Result | scale_factor |
|
||||
|-----------|----------|--------|--------------|
|
||||
| Small (800×600) | max=800 < 1200 | UP → 1600×1200 | 0.500 |
|
||||
| Optimal (1500×1000) | 1200 ≤ 1500 ≤ 2000 | No scaling | 1.000 |
|
||||
| Large (2480×3508) | max=3508 > 2000 | DOWN → 1131×1600 | 2.192 |
|
||||
| Very small (400×300) | max=400 < 1200 | UP → 1600×1200 | 0.250 |
|
||||
|
||||
---
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
### Scaling Decision Matrix
|
||||
|
||||
| Image Size | Action | Scale Factor | Interpolation |
|
||||
|------------|--------|--------------|---------------|
|
||||
| < 1200px | Scale UP | target/max_dim | INTER_CUBIC |
|
||||
| 1200-2000px | No scaling | 1.0 | N/A |
|
||||
| > 2000px | Scale DOWN | target/max_dim | INTER_AREA |
|
||||
|
||||
### Example Scenarios
|
||||
|
||||
1. **Small scan (800×600)**
|
||||
- max_dim = 800 < 1200 → Scale UP
|
||||
- target = 1600, scale = 1600/800 = 2.0
|
||||
- Result: 1600×1200
|
||||
- scale_factor (for bbox restore) = 0.5
|
||||
|
||||
2. **Optimal image (1400×1000)**
|
||||
- max_dim = 1400, 1200 <= 1400 <= 2000 → No scaling
|
||||
- Result: unchanged
|
||||
- scale_factor = 1.0
|
||||
|
||||
3. **High-res scan (2480×3508)**
|
||||
- max_dim = 3508 > 2000 → Scale DOWN
|
||||
- target = 1600, scale = 1600/3508 = 0.456
|
||||
- Result: 1131×1600
|
||||
- scale_factor (for bbox restore) = 2.19
|
||||
Reference in New Issue
Block a user