feat: create extract-table-cell-boxes proposal and archive old proposal

- Archive unify-image-scaling proposal to archive/2025-11-28 - Create new extract-table-cell-boxes proposal for supplementing PPStructureV3 with direct SLANeXt model calls to extract table cell bounding boxes - Add debug logging to pp_structure_enhanced.py for table cell boxes investigation - Discovered that PPStructureV3 high-level API filters out cell bbox data, but paddlex.create_model() can directly invoke underlying models 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 12:15:06 +08:00
parent dda9621e17
commit 801ee9c4b6
7 changed files with 393 additions and 4 deletions
--- a/openspec/changes/archive/2025-11-28-unify-image-scaling/specs/ocr-processing/spec.md
+++ b/openspec/changes/archive/2025-11-28-unify-image-scaling/specs/ocr-processing/spec.md
@@ -0,0 +1,42 @@
+## MODIFIED Requirements
+
+### Requirement: Image Scaling for Layout Detection
+
+The system SHALL apply bidirectional image scaling to optimize PP-Structure layout detection accuracy:
+
+1. Images with longest side > `layout_image_scaling_max_dimension` (default: 2000px) SHALL be scaled DOWN to `layout_image_scaling_target_dimension` (default: 1600px)
+
+2. Images with longest side < `layout_image_scaling_min_dimension` (default: 1200px) SHALL be scaled UP to `layout_image_scaling_target_dimension` (default: 1600px)
+
+3. Images within the optimal range (min_dimension to max_dimension) SHALL NOT be scaled
+
+4. For downscaling, the system SHALL use `cv2.INTER_AREA` interpolation (best for shrinking)
+
+5. For upscaling, the system SHALL use `cv2.INTER_CUBIC` interpolation (smooth enlargement)
+
+6. The system SHALL track the scale factor and restore bounding box coordinates to original image space after layout detection
+
+7. Raw OCR and element extraction SHALL continue to use original/unscaled images
+
+#### Scenario: Large image is scaled down
+- **WHEN** an image has max dimension 2480px (> 2000px threshold)
+- **THEN** the image is scaled down to ~1600px on longest side
+- **AND** scale_factor is recorded as ~2.19 for bbox restoration
+- **AND** INTER_AREA interpolation is used
+
+#### Scenario: Small image is scaled up
+- **WHEN** an image has max dimension 800px (< 1200px threshold)
+- **THEN** the image is scaled up to ~1600px on longest side
+- **AND** scale_factor is recorded as ~0.5 for bbox restoration
+- **AND** INTER_CUBIC interpolation is used
+
+#### Scenario: Optimal size image is not scaled
+- **WHEN** an image has max dimension 1500px (within 1200-2000px range)
+- **THEN** the image is NOT scaled
+- **AND** scale_factor is 1.0
+- **AND** was_scaled is False
+
+#### Scenario: Bbox coordinates are restored after scaling
+- **WHEN** layout detection returns bbox [100, 200, 500, 600] on scaled image
+- **AND** scale_factor is 2.0 (image was scaled down by 0.5)
+- **THEN** final bbox is [200, 400, 1000, 1200] in original image coordinates