feat: create extract-table-cell-boxes proposal and archive old proposal

- Archive unify-image-scaling proposal to archive/2025-11-28
- Create new extract-table-cell-boxes proposal for supplementing PPStructureV3
  with direct SLANeXt model calls to extract table cell bounding boxes
- Add debug logging to pp_structure_enhanced.py for table cell boxes investigation
- Discovered that PPStructureV3 high-level API filters out cell bbox data,
  but paddlex.create_model() can directly invoke underlying models

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
egg
2025-11-28 12:15:06 +08:00
parent dda9621e17
commit 801ee9c4b6
7 changed files with 393 additions and 4 deletions

View File

@@ -0,0 +1,42 @@
## MODIFIED Requirements
### Requirement: Image Scaling for Layout Detection
The system SHALL apply bidirectional image scaling to optimize PP-Structure layout detection accuracy:
1. Images with longest side > `layout_image_scaling_max_dimension` (default: 2000px) SHALL be scaled DOWN to `layout_image_scaling_target_dimension` (default: 1600px)
2. Images with longest side < `layout_image_scaling_min_dimension` (default: 1200px) SHALL be scaled UP to `layout_image_scaling_target_dimension` (default: 1600px)
3. Images within the optimal range (min_dimension to max_dimension) SHALL NOT be scaled
4. For downscaling, the system SHALL use `cv2.INTER_AREA` interpolation (best for shrinking)
5. For upscaling, the system SHALL use `cv2.INTER_CUBIC` interpolation (smooth enlargement)
6. The system SHALL track the scale factor and restore bounding box coordinates to original image space after layout detection
7. Raw OCR and element extraction SHALL continue to use original/unscaled images
#### Scenario: Large image is scaled down
- **WHEN** an image has max dimension 2480px (> 2000px threshold)
- **THEN** the image is scaled down to ~1600px on longest side
- **AND** scale_factor is recorded as ~2.19 for bbox restoration
- **AND** INTER_AREA interpolation is used
#### Scenario: Small image is scaled up
- **WHEN** an image has max dimension 800px (< 1200px threshold)
- **THEN** the image is scaled up to ~1600px on longest side
- **AND** scale_factor is recorded as ~0.5 for bbox restoration
- **AND** INTER_CUBIC interpolation is used
#### Scenario: Optimal size image is not scaled
- **WHEN** an image has max dimension 1500px (within 1200-2000px range)
- **THEN** the image is NOT scaled
- **AND** scale_factor is 1.0
- **AND** was_scaled is False
#### Scenario: Bbox coordinates are restored after scaling
- **WHEN** layout detection returns bbox [100, 200, 500, 600] on scaled image
- **AND** scale_factor is 2.0 (image was scaled down by 0.5)
- **THEN** final bbox is [200, 400, 1000, 1200] in original image coordinates