- Archive unify-image-scaling proposal to archive/2025-11-28 - Create new extract-table-cell-boxes proposal for supplementing PPStructureV3 with direct SLANeXt model calls to extract table cell bounding boxes - Add debug logging to pp_structure_enhanced.py for table cell boxes investigation - Discovered that PPStructureV3 high-level API filters out cell bbox data, but paddlex.create_model() can directly invoke underlying models 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2.0 KiB
MODIFIED Requirements
Requirement: Image Scaling for Layout Detection
The system SHALL apply bidirectional image scaling to optimize PP-Structure layout detection accuracy:
-
Images with longest side >
layout_image_scaling_max_dimension(default: 2000px) SHALL be scaled DOWN tolayout_image_scaling_target_dimension(default: 1600px) -
Images with longest side <
layout_image_scaling_min_dimension(default: 1200px) SHALL be scaled UP tolayout_image_scaling_target_dimension(default: 1600px) -
Images within the optimal range (min_dimension to max_dimension) SHALL NOT be scaled
-
For downscaling, the system SHALL use
cv2.INTER_AREAinterpolation (best for shrinking) -
For upscaling, the system SHALL use
cv2.INTER_CUBICinterpolation (smooth enlargement) -
The system SHALL track the scale factor and restore bounding box coordinates to original image space after layout detection
-
Raw OCR and element extraction SHALL continue to use original/unscaled images
Scenario: Large image is scaled down
- WHEN an image has max dimension 2480px (> 2000px threshold)
- THEN the image is scaled down to ~1600px on longest side
- AND scale_factor is recorded as ~2.19 for bbox restoration
- AND INTER_AREA interpolation is used
Scenario: Small image is scaled up
- WHEN an image has max dimension 800px (< 1200px threshold)
- THEN the image is scaled up to ~1600px on longest side
- AND scale_factor is recorded as ~0.5 for bbox restoration
- AND INTER_CUBIC interpolation is used
Scenario: Optimal size image is not scaled
- WHEN an image has max dimension 1500px (within 1200-2000px range)
- THEN the image is NOT scaled
- AND scale_factor is 1.0
- AND was_scaled is False
Scenario: Bbox coordinates are restored after scaling
- WHEN layout detection returns bbox [100, 200, 500, 600] on scaled image
- AND scale_factor is 2.0 (image was scaled down by 0.5)
- THEN final bbox is [200, 400, 1000, 1200] in original image coordinates