feat: create extract-table-cell-boxes proposal and archive old proposal

- Archive unify-image-scaling proposal to archive/2025-11-28 - Create new extract-table-cell-boxes proposal for supplementing PPStructureV3 with direct SLANeXt model calls to extract table cell bounding boxes - Add debug logging to pp_structure_enhanced.py for table cell boxes investigation - Discovered that PPStructureV3 high-level API filters out cell bbox data, but paddlex.create_model() can directly invoke underlying models 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 12:15:06 +08:00
parent dda9621e17
commit 801ee9c4b6
7 changed files with 393 additions and 4 deletions
--- a/openspec/changes/archive/2025-11-28-unify-image-scaling/proposal.md
+++ b/openspec/changes/archive/2025-11-28-unify-image-scaling/proposal.md
@@ -0,0 +1,72 @@
+# Change: Unify Image Scaling Strategy for Optimal Layout Detection
+
+## Why
+
+Currently, the system has inconsistent image resolution handling:
+
+1. **PDF conversion**: Always uses 300 DPI, producing ~2480×3508 images for A4
+2. **Image downscaling**: Only applied when image > 2000px (no upscaling)
+3. **Small images**: Never scaled up, even if they're below optimal detection size
+
+This inconsistency causes:
+- Wasted processing: PDF→300DPI→scale down to 1600px (double conversion)
+- Suboptimal detection: Small images stay small, missing table structures
+- Inconsistent behavior: Different source formats get different treatment
+
+PP-Structure's layout detection model (RT-DETR based) works best with images around 1600px on the longest side. Both too-large and too-small images reduce detection accuracy.
+
+## What Changes
+
+- **Bidirectional scaling for PP-Structure**
+  - Scale DOWN images larger than max threshold (2000px) → target (1600px)
+  - Scale UP images smaller than min threshold (1200px) → target (1600px)
+  - No change for images in optimal range (1200-2000px)
+
+- **PDF conversion DPI optimization**
+  - Calculate optimal DPI based on target resolution
+  - Avoid double-scaling (convert at high DPI then scale down)
+  - Option to use adaptive DPI or fixed DPI with post-scaling
+
+- **Unified scaling logic**
+  - Same rules apply to all image sources (IMG, PDF pages)
+  - Scaling happens once at preprocessing stage
+  - Bbox coordinates scaled back to original for accurate cropping
+
+- **Configuration**
+  - `layout_image_scaling_min_dimension`: Minimum size before upscaling (default: 1200)
+  - Keep existing `layout_image_scaling_max_dimension` (2000) and `target_dimension` (1600)
+
+## Impact
+
+### Affected Specs
+- `ocr-processing` - Modified scaling requirements
+
+### Affected Code
+- `backend/app/core/config.py` - Add min_dimension setting
+- `backend/app/services/layout_preprocessing_service.py` - Add upscaling logic
+- `backend/app/services/ocr_service.py` - Optional: Adjust PDF DPI handling
+
+### Quality Impact
+
+| Scenario | Before | After |
+|----------|--------|-------|
+| Large image (3000px) | Scaled to 1600px | Same |
+| Optimal image (1500px) | No scaling | Same |
+| Small image (800px) | No scaling | Scaled to 1600px |
+| PDF at 300 DPI | 2480px → 1600px | Same (or optimized DPI) |
+
+### Raw OCR Impact
+- No change: Raw OCR continues to use original/converted images
+- Upscaling only affects PP-Structure layout detection input
+
+## Risks
+
+1. **Upscaling quality**: Enlarging small images may introduce interpolation artifacts
+   - Mitigation: Use INTER_CUBIC or INTER_LANCZOS4 for upscaling
+   - Note: Layout detection cares about structure, not fine text detail
+
+2. **Memory for large upscaled images**: Small image scaled up uses more memory
+   - Mitigation: 800px → 1600px is 4x pixels, but 1600px is still reasonable
+
+3. **Breaking existing behavior**: Users may rely on current behavior
+   - Mitigation: Document the change, add config toggle if needed