Files
OCR/openspec/changes/archive/2025-11-28-unify-image-scaling/specs/ocr-processing/spec.md
egg 801ee9c4b6 feat: create extract-table-cell-boxes proposal and archive old proposal
- Archive unify-image-scaling proposal to archive/2025-11-28
- Create new extract-table-cell-boxes proposal for supplementing PPStructureV3
  with direct SLANeXt model calls to extract table cell bounding boxes
- Add debug logging to pp_structure_enhanced.py for table cell boxes investigation
- Discovered that PPStructureV3 high-level API filters out cell bbox data,
  but paddlex.create_model() can directly invoke underlying models

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 12:15:06 +08:00

2.0 KiB

MODIFIED Requirements

Requirement: Image Scaling for Layout Detection

The system SHALL apply bidirectional image scaling to optimize PP-Structure layout detection accuracy:

  1. Images with longest side > layout_image_scaling_max_dimension (default: 2000px) SHALL be scaled DOWN to layout_image_scaling_target_dimension (default: 1600px)

  2. Images with longest side < layout_image_scaling_min_dimension (default: 1200px) SHALL be scaled UP to layout_image_scaling_target_dimension (default: 1600px)

  3. Images within the optimal range (min_dimension to max_dimension) SHALL NOT be scaled

  4. For downscaling, the system SHALL use cv2.INTER_AREA interpolation (best for shrinking)

  5. For upscaling, the system SHALL use cv2.INTER_CUBIC interpolation (smooth enlargement)

  6. The system SHALL track the scale factor and restore bounding box coordinates to original image space after layout detection

  7. Raw OCR and element extraction SHALL continue to use original/unscaled images

Scenario: Large image is scaled down

  • WHEN an image has max dimension 2480px (> 2000px threshold)
  • THEN the image is scaled down to ~1600px on longest side
  • AND scale_factor is recorded as ~2.19 for bbox restoration
  • AND INTER_AREA interpolation is used

Scenario: Small image is scaled up

  • WHEN an image has max dimension 800px (< 1200px threshold)
  • THEN the image is scaled up to ~1600px on longest side
  • AND scale_factor is recorded as ~0.5 for bbox restoration
  • AND INTER_CUBIC interpolation is used

Scenario: Optimal size image is not scaled

  • WHEN an image has max dimension 1500px (within 1200-2000px range)
  • THEN the image is NOT scaled
  • AND scale_factor is 1.0
  • AND was_scaled is False

Scenario: Bbox coordinates are restored after scaling

  • WHEN layout detection returns bbox [100, 200, 500, 600] on scaled image
  • AND scale_factor is 2.0 (image was scaled down by 0.5)
  • THEN final bbox is [200, 400, 1000, 1200] in original image coordinates