feat: create extract-table-cell-boxes proposal and archive old proposal

- Archive unify-image-scaling proposal to archive/2025-11-28
- Create new extract-table-cell-boxes proposal for supplementing PPStructureV3
  with direct SLANeXt model calls to extract table cell bounding boxes
- Add debug logging to pp_structure_enhanced.py for table cell boxes investigation
- Discovered that PPStructureV3 high-level API filters out cell bbox data,
  but paddlex.create_model() can directly invoke underlying models

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
egg
2025-11-28 12:15:06 +08:00
parent dda9621e17
commit 801ee9c4b6
7 changed files with 393 additions and 4 deletions

View File

@@ -0,0 +1,72 @@
# Change: Unify Image Scaling Strategy for Optimal Layout Detection
## Why
Currently, the system has inconsistent image resolution handling:
1. **PDF conversion**: Always uses 300 DPI, producing ~2480×3508 images for A4
2. **Image downscaling**: Only applied when image > 2000px (no upscaling)
3. **Small images**: Never scaled up, even if they're below optimal detection size
This inconsistency causes:
- Wasted processing: PDF→300DPI→scale down to 1600px (double conversion)
- Suboptimal detection: Small images stay small, missing table structures
- Inconsistent behavior: Different source formats get different treatment
PP-Structure's layout detection model (RT-DETR based) works best with images around 1600px on the longest side. Both too-large and too-small images reduce detection accuracy.
## What Changes
- **Bidirectional scaling for PP-Structure**
- Scale DOWN images larger than max threshold (2000px) → target (1600px)
- Scale UP images smaller than min threshold (1200px) → target (1600px)
- No change for images in optimal range (1200-2000px)
- **PDF conversion DPI optimization**
- Calculate optimal DPI based on target resolution
- Avoid double-scaling (convert at high DPI then scale down)
- Option to use adaptive DPI or fixed DPI with post-scaling
- **Unified scaling logic**
- Same rules apply to all image sources (IMG, PDF pages)
- Scaling happens once at preprocessing stage
- Bbox coordinates scaled back to original for accurate cropping
- **Configuration**
- `layout_image_scaling_min_dimension`: Minimum size before upscaling (default: 1200)
- Keep existing `layout_image_scaling_max_dimension` (2000) and `target_dimension` (1600)
## Impact
### Affected Specs
- `ocr-processing` - Modified scaling requirements
### Affected Code
- `backend/app/core/config.py` - Add min_dimension setting
- `backend/app/services/layout_preprocessing_service.py` - Add upscaling logic
- `backend/app/services/ocr_service.py` - Optional: Adjust PDF DPI handling
### Quality Impact
| Scenario | Before | After |
|----------|--------|-------|
| Large image (3000px) | Scaled to 1600px | Same |
| Optimal image (1500px) | No scaling | Same |
| Small image (800px) | No scaling | Scaled to 1600px |
| PDF at 300 DPI | 2480px → 1600px | Same (or optimized DPI) |
### Raw OCR Impact
- No change: Raw OCR continues to use original/converted images
- Upscaling only affects PP-Structure layout detection input
## Risks
1. **Upscaling quality**: Enlarging small images may introduce interpolation artifacts
- Mitigation: Use INTER_CUBIC or INTER_LANCZOS4 for upscaling
- Note: Layout detection cares about structure, not fine text detail
2. **Memory for large upscaled images**: Small image scaled up uses more memory
- Mitigation: 800px → 1600px is 4x pixels, but 1600px is still reasonable
3. **Breaking existing behavior**: Users may rely on current behavior
- Mitigation: Document the change, add config toggle if needed

View File

@@ -0,0 +1,42 @@
## MODIFIED Requirements
### Requirement: Image Scaling for Layout Detection
The system SHALL apply bidirectional image scaling to optimize PP-Structure layout detection accuracy:
1. Images with longest side > `layout_image_scaling_max_dimension` (default: 2000px) SHALL be scaled DOWN to `layout_image_scaling_target_dimension` (default: 1600px)
2. Images with longest side < `layout_image_scaling_min_dimension` (default: 1200px) SHALL be scaled UP to `layout_image_scaling_target_dimension` (default: 1600px)
3. Images within the optimal range (min_dimension to max_dimension) SHALL NOT be scaled
4. For downscaling, the system SHALL use `cv2.INTER_AREA` interpolation (best for shrinking)
5. For upscaling, the system SHALL use `cv2.INTER_CUBIC` interpolation (smooth enlargement)
6. The system SHALL track the scale factor and restore bounding box coordinates to original image space after layout detection
7. Raw OCR and element extraction SHALL continue to use original/unscaled images
#### Scenario: Large image is scaled down
- **WHEN** an image has max dimension 2480px (> 2000px threshold)
- **THEN** the image is scaled down to ~1600px on longest side
- **AND** scale_factor is recorded as ~2.19 for bbox restoration
- **AND** INTER_AREA interpolation is used
#### Scenario: Small image is scaled up
- **WHEN** an image has max dimension 800px (< 1200px threshold)
- **THEN** the image is scaled up to ~1600px on longest side
- **AND** scale_factor is recorded as ~0.5 for bbox restoration
- **AND** INTER_CUBIC interpolation is used
#### Scenario: Optimal size image is not scaled
- **WHEN** an image has max dimension 1500px (within 1200-2000px range)
- **THEN** the image is NOT scaled
- **AND** scale_factor is 1.0
- **AND** was_scaled is False
#### Scenario: Bbox coordinates are restored after scaling
- **WHEN** layout detection returns bbox [100, 200, 500, 600] on scaled image
- **AND** scale_factor is 2.0 (image was scaled down by 0.5)
- **THEN** final bbox is [200, 400, 1000, 1200] in original image coordinates

View File

@@ -0,0 +1,113 @@
# Tasks: Unify Image Scaling Strategy
## 1. Configuration
- [x] 1.1 Add min_dimension setting to `backend/app/core/config.py`
- `layout_image_scaling_min_dimension: int = 1200`
- Description: "Min dimension (pixels) before upscaling. Images smaller than this will be scaled up."
## 2. Bidirectional Scaling Logic
- [x] 2.1 Update `scale_for_layout_detection()` in `layout_preprocessing_service.py`
- Add upscaling condition: `max_dim < min_dimension`
- Use `cv2.INTER_CUBIC` for upscaling (better quality than INTER_LINEAR)
- Update docstring to reflect bidirectional behavior
- [x] 2.2 Update scaling decision logic
```python
# Current: only downscale
should_scale = max_dim > max_dimension
# New: bidirectional
should_downscale = max_dim > max_dimension
should_upscale = max_dim < min_dimension
should_scale = should_downscale or should_upscale
```
- [x] 2.3 Update logging to indicate scale direction
- "Scaled DOWN for layout detection: 2480x3508 -> 1131x1600"
- "Scaled UP for layout detection: 800x600 -> 1600x1200"
## 3. PDF DPI Handling (Optional Optimization)
- [x] 3.1 Evaluate current PDF conversion impact
- Decision: Keep 300 DPI, let bidirectional scaling handle it
- Reason: Raw OCR benefits from high resolution, scaling handles PP-Structure needs
- [x] 3.2 Option A: Keep 300 DPI, let scaling handle it ✓
- Simplest approach, no change needed
- Raw OCR benefits from high resolution
- [ ] ~~3.3 Option B: Add configurable PDF DPI~~ (Not needed)
## 4. Testing
- [x] 4.1 Test upscaling with small images
- Small image (800x600): Scaled UP → 1600x1200, scale_factor=0.500
- Very small (400x300): Scaled UP → 1600x1200, scale_factor=0.250
- [x] 4.2 Test no scaling for optimal range
- Optimal image (1500x1000): was_scaled=False, scale_factor=1.000
- [x] 4.3 Test downscaling (existing behavior)
- Large image (2480x3508): Scaled DOWN → 1131x1600, scale_factor=2.192
- [ ] 4.4 Test PDF workflow (manual test recommended)
- PDF page should be detected correctly
- Scaling should apply after PDF conversion
## 5. Documentation
- [x] 5.1 Update config.py Field descriptions
- Explained bidirectional scaling in enabled field description
- Updated max/min/target descriptions
- [x] 5.2 Add logging for scaling decisions
- Logs direction (UP/DOWN), original size, target size, scale_factor
---
## Implementation Summary
**Files Modified:**
- `backend/app/core/config.py` - Added `layout_image_scaling_min_dimension` setting
- `backend/app/services/layout_preprocessing_service.py` - Updated bidirectional scaling logic
**Test Results (2025-11-27):**
| Test Case | Original | Result | scale_factor |
|-----------|----------|--------|--------------|
| Small (800×600) | max=800 < 1200 | UP 1600×1200 | 0.500 |
| Optimal (1500×1000) | 1200 1500 2000 | No scaling | 1.000 |
| Large (2480×3508) | max=3508 > 2000 | DOWN → 1131×1600 | 2.192 |
| Very small (400×300) | max=400 < 1200 | UP 1600×1200 | 0.250 |
---
## Implementation Notes
### Scaling Decision Matrix
| Image Size | Action | Scale Factor | Interpolation |
|------------|--------|--------------|---------------|
| < 1200px | Scale UP | target/max_dim | INTER_CUBIC |
| 1200-2000px | No scaling | 1.0 | N/A |
| > 2000px | Scale DOWN | target/max_dim | INTER_AREA |
### Example Scenarios
1. **Small scan (800×600)**
- max_dim = 800 < 1200 Scale UP
- target = 1600, scale = 1600/800 = 2.0
- Result: 1600×1200
- scale_factor (for bbox restore) = 0.5
2. **Optimal image (1400×1000)**
- max_dim = 1400, 1200 <= 1400 <= 2000 No scaling
- Result: unchanged
- scale_factor = 1.0
3. **High-res scan (2480×3508)**
- max_dim = 3508 > 2000 → Scale DOWN
- target = 1600, scale = 1600/3508 = 0.456
- Result: 1131×1600
- scale_factor (for bbox restore) = 2.19