feat: enhance layout preprocessing and unify image scaling proposal

Backend changes: - Add image scaling configuration for PP-Structure processing - Enhance layout preprocessing service with scaling support - Update OCR service with improved memory management - Add PP-Structure enhanced processing improvements Frontend changes: - Update preprocessing settings UI - Fix processing page layout and state management - Update API types for new parameters Proposals: - Archive add-layout-preprocessing proposal (completed) - Add unify-image-scaling proposal for consistent coordinate handling 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 09:23:19 +08:00
parent 86bbea6fbf
commit dda9621e17
17 changed files with 826 additions and 104 deletions
--- a/openspec/changes/unify-image-scaling/proposal.md
+++ b/openspec/changes/unify-image-scaling/proposal.md
@@ -0,0 +1,72 @@
+# Change: Unify Image Scaling Strategy for Optimal Layout Detection
+
+## Why
+
+Currently, the system has inconsistent image resolution handling:
+
+1. **PDF conversion**: Always uses 300 DPI, producing ~2480×3508 images for A4
+2. **Image downscaling**: Only applied when image > 2000px (no upscaling)
+3. **Small images**: Never scaled up, even if they're below optimal detection size
+
+This inconsistency causes:
+- Wasted processing: PDF→300DPI→scale down to 1600px (double conversion)
+- Suboptimal detection: Small images stay small, missing table structures
+- Inconsistent behavior: Different source formats get different treatment
+
+PP-Structure's layout detection model (RT-DETR based) works best with images around 1600px on the longest side. Both too-large and too-small images reduce detection accuracy.
+
+## What Changes
+
+- **Bidirectional scaling for PP-Structure**
+  - Scale DOWN images larger than max threshold (2000px) → target (1600px)
+  - Scale UP images smaller than min threshold (1200px) → target (1600px)
+  - No change for images in optimal range (1200-2000px)
+
+- **PDF conversion DPI optimization**
+  - Calculate optimal DPI based on target resolution
+  - Avoid double-scaling (convert at high DPI then scale down)
+  - Option to use adaptive DPI or fixed DPI with post-scaling
+
+- **Unified scaling logic**
+  - Same rules apply to all image sources (IMG, PDF pages)
+  - Scaling happens once at preprocessing stage
+  - Bbox coordinates scaled back to original for accurate cropping
+
+- **Configuration**
+  - `layout_image_scaling_min_dimension`: Minimum size before upscaling (default: 1200)
+  - Keep existing `layout_image_scaling_max_dimension` (2000) and `target_dimension` (1600)
+
+## Impact
+
+### Affected Specs
+- `ocr-processing` - Modified scaling requirements
+
+### Affected Code
+- `backend/app/core/config.py` - Add min_dimension setting
+- `backend/app/services/layout_preprocessing_service.py` - Add upscaling logic
+- `backend/app/services/ocr_service.py` - Optional: Adjust PDF DPI handling
+
+### Quality Impact
+
+| Scenario | Before | After |
+|----------|--------|-------|
+| Large image (3000px) | Scaled to 1600px | Same |
+| Optimal image (1500px) | No scaling | Same |
+| Small image (800px) | No scaling | Scaled to 1600px |
+| PDF at 300 DPI | 2480px → 1600px | Same (or optimized DPI) |
+
+### Raw OCR Impact
+- No change: Raw OCR continues to use original/converted images
+- Upscaling only affects PP-Structure layout detection input
+
+## Risks
+
+1. **Upscaling quality**: Enlarging small images may introduce interpolation artifacts
+   - Mitigation: Use INTER_CUBIC or INTER_LANCZOS4 for upscaling
+   - Note: Layout detection cares about structure, not fine text detail
+
+2. **Memory for large upscaled images**: Small image scaled up uses more memory
+   - Mitigation: 800px → 1600px is 4x pixels, but 1600px is still reasonable
+
+3. **Breaking existing behavior**: Users may rely on current behavior
+   - Mitigation: Document the change, add config toggle if needed
--- a/openspec/changes/unify-image-scaling/specs/ocr-processing/spec.md
+++ b/openspec/changes/unify-image-scaling/specs/ocr-processing/spec.md
@@ -0,0 +1,42 @@
+## MODIFIED Requirements
+
+### Requirement: Image Scaling for Layout Detection
+
+The system SHALL apply bidirectional image scaling to optimize PP-Structure layout detection accuracy:
+
+1. Images with longest side > `layout_image_scaling_max_dimension` (default: 2000px) SHALL be scaled DOWN to `layout_image_scaling_target_dimension` (default: 1600px)
+
+2. Images with longest side < `layout_image_scaling_min_dimension` (default: 1200px) SHALL be scaled UP to `layout_image_scaling_target_dimension` (default: 1600px)
+
+3. Images within the optimal range (min_dimension to max_dimension) SHALL NOT be scaled
+
+4. For downscaling, the system SHALL use `cv2.INTER_AREA` interpolation (best for shrinking)
+
+5. For upscaling, the system SHALL use `cv2.INTER_CUBIC` interpolation (smooth enlargement)
+
+6. The system SHALL track the scale factor and restore bounding box coordinates to original image space after layout detection
+
+7. Raw OCR and element extraction SHALL continue to use original/unscaled images
+
+#### Scenario: Large image is scaled down
+- **WHEN** an image has max dimension 2480px (> 2000px threshold)
+- **THEN** the image is scaled down to ~1600px on longest side
+- **AND** scale_factor is recorded as ~2.19 for bbox restoration
+- **AND** INTER_AREA interpolation is used
+
+#### Scenario: Small image is scaled up
+- **WHEN** an image has max dimension 800px (< 1200px threshold)
+- **THEN** the image is scaled up to ~1600px on longest side
+- **AND** scale_factor is recorded as ~0.5 for bbox restoration
+- **AND** INTER_CUBIC interpolation is used
+
+#### Scenario: Optimal size image is not scaled
+- **WHEN** an image has max dimension 1500px (within 1200-2000px range)
+- **THEN** the image is NOT scaled
+- **AND** scale_factor is 1.0
+- **AND** was_scaled is False
+
+#### Scenario: Bbox coordinates are restored after scaling
+- **WHEN** layout detection returns bbox [100, 200, 500, 600] on scaled image
+- **AND** scale_factor is 2.0 (image was scaled down by 0.5)
+- **THEN** final bbox is [200, 400, 1000, 1200] in original image coordinates
--- a/openspec/changes/unify-image-scaling/tasks.md
+++ b/openspec/changes/unify-image-scaling/tasks.md
@@ -0,0 +1,113 @@
+# Tasks: Unify Image Scaling Strategy
+
+## 1. Configuration
+
+- [x] 1.1 Add min_dimension setting to `backend/app/core/config.py`
+  - `layout_image_scaling_min_dimension: int = 1200`
+  - Description: "Min dimension (pixels) before upscaling. Images smaller than this will be scaled up."
+
+## 2. Bidirectional Scaling Logic
+
+- [x] 2.1 Update `scale_for_layout_detection()` in `layout_preprocessing_service.py`
+  - Add upscaling condition: `max_dim < min_dimension`
+  - Use `cv2.INTER_CUBIC` for upscaling (better quality than INTER_LINEAR)
+  - Update docstring to reflect bidirectional behavior
+
+- [x] 2.2 Update scaling decision logic
+  ```python
+  # Current: only downscale
+  should_scale = max_dim > max_dimension
+
+  # New: bidirectional
+  should_downscale = max_dim > max_dimension
+  should_upscale = max_dim < min_dimension
+  should_scale = should_downscale or should_upscale
+  ```
+
+- [x] 2.3 Update logging to indicate scale direction
+  - "Scaled DOWN for layout detection: 2480x3508 -> 1131x1600"
+  - "Scaled UP for layout detection: 800x600 -> 1600x1200"
+
+## 3. PDF DPI Handling (Optional Optimization)
+
+- [x] 3.1 Evaluate current PDF conversion impact
+  - Decision: Keep 300 DPI, let bidirectional scaling handle it
+  - Reason: Raw OCR benefits from high resolution, scaling handles PP-Structure needs
+
+- [x] 3.2 Option A: Keep 300 DPI, let scaling handle it ✓
+  - Simplest approach, no change needed
+  - Raw OCR benefits from high resolution
+
+- [ ] ~~3.3 Option B: Add configurable PDF DPI~~ (Not needed)
+
+## 4. Testing
+
+- [x] 4.1 Test upscaling with small images
+  - Small image (800x600): Scaled UP → 1600x1200, scale_factor=0.500
+  - Very small (400x300): Scaled UP → 1600x1200, scale_factor=0.250
+
+- [x] 4.2 Test no scaling for optimal range
+  - Optimal image (1500x1000): was_scaled=False, scale_factor=1.000
+
+- [x] 4.3 Test downscaling (existing behavior)
+  - Large image (2480x3508): Scaled DOWN → 1131x1600, scale_factor=2.192
+
+- [ ] 4.4 Test PDF workflow (manual test recommended)
+  - PDF page should be detected correctly
+  - Scaling should apply after PDF conversion
+
+## 5. Documentation
+
+- [x] 5.1 Update config.py Field descriptions
+  - Explained bidirectional scaling in enabled field description
+  - Updated max/min/target descriptions
+
+- [x] 5.2 Add logging for scaling decisions
+  - Logs direction (UP/DOWN), original size, target size, scale_factor
+
+---
+
+## Implementation Summary
+
+**Files Modified:**
+- `backend/app/core/config.py` - Added `layout_image_scaling_min_dimension` setting
+- `backend/app/services/layout_preprocessing_service.py` - Updated bidirectional scaling logic
+
+**Test Results (2025-11-27):**
+| Test Case | Original | Result | scale_factor |
+|-----------|----------|--------|--------------|
+| Small (800×600) | max=800 < 1200 | UP → 1600×1200 | 0.500 |
+| Optimal (1500×1000) | 1200 ≤ 1500 ≤ 2000 | No scaling | 1.000 |
+| Large (2480×3508) | max=3508 > 2000 | DOWN → 1131×1600 | 2.192 |
+| Very small (400×300) | max=400 < 1200 | UP → 1600×1200 | 0.250 |
+
+---
+
+## Implementation Notes
+
+### Scaling Decision Matrix
+
+| Image Size | Action | Scale Factor | Interpolation |
+|------------|--------|--------------|---------------|
+| < 1200px | Scale UP | target/max_dim | INTER_CUBIC |
+| 1200-2000px | No scaling | 1.0 | N/A |
+| > 2000px | Scale DOWN | target/max_dim | INTER_AREA |
+
+### Example Scenarios
+
+1. **Small scan (800×600)**
+   - max_dim = 800 < 1200 → Scale UP
+   - target = 1600, scale = 1600/800 = 2.0
+   - Result: 1600×1200
+   - scale_factor (for bbox restore) = 0.5
+
+2. **Optimal image (1400×1000)**
+   - max_dim = 1400, 1200 <= 1400 <= 2000 → No scaling
+   - Result: unchanged
+   - scale_factor = 1.0
+
+3. **High-res scan (2480×3508)**
+   - max_dim = 3508 > 2000 → Scale DOWN
+   - target = 1600, scale = 1600/3508 = 0.456
+   - Result: 1131×1600
+   - scale_factor (for bbox restore) = 2.19