Files
OCR/openspec/changes/unify-image-scaling/specs/ocr-processing/spec.md
egg dda9621e17 feat: enhance layout preprocessing and unify image scaling proposal
Backend changes:
- Add image scaling configuration for PP-Structure processing
- Enhance layout preprocessing service with scaling support
- Update OCR service with improved memory management
- Add PP-Structure enhanced processing improvements

Frontend changes:
- Update preprocessing settings UI
- Fix processing page layout and state management
- Update API types for new parameters

Proposals:
- Archive add-layout-preprocessing proposal (completed)
- Add unify-image-scaling proposal for consistent coordinate handling

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 09:23:19 +08:00

2.0 KiB

MODIFIED Requirements

Requirement: Image Scaling for Layout Detection

The system SHALL apply bidirectional image scaling to optimize PP-Structure layout detection accuracy:

  1. Images with longest side > layout_image_scaling_max_dimension (default: 2000px) SHALL be scaled DOWN to layout_image_scaling_target_dimension (default: 1600px)

  2. Images with longest side < layout_image_scaling_min_dimension (default: 1200px) SHALL be scaled UP to layout_image_scaling_target_dimension (default: 1600px)

  3. Images within the optimal range (min_dimension to max_dimension) SHALL NOT be scaled

  4. For downscaling, the system SHALL use cv2.INTER_AREA interpolation (best for shrinking)

  5. For upscaling, the system SHALL use cv2.INTER_CUBIC interpolation (smooth enlargement)

  6. The system SHALL track the scale factor and restore bounding box coordinates to original image space after layout detection

  7. Raw OCR and element extraction SHALL continue to use original/unscaled images

Scenario: Large image is scaled down

  • WHEN an image has max dimension 2480px (> 2000px threshold)
  • THEN the image is scaled down to ~1600px on longest side
  • AND scale_factor is recorded as ~2.19 for bbox restoration
  • AND INTER_AREA interpolation is used

Scenario: Small image is scaled up

  • WHEN an image has max dimension 800px (< 1200px threshold)
  • THEN the image is scaled up to ~1600px on longest side
  • AND scale_factor is recorded as ~0.5 for bbox restoration
  • AND INTER_CUBIC interpolation is used

Scenario: Optimal size image is not scaled

  • WHEN an image has max dimension 1500px (within 1200-2000px range)
  • THEN the image is NOT scaled
  • AND scale_factor is 1.0
  • AND was_scaled is False

Scenario: Bbox coordinates are restored after scaling

  • WHEN layout detection returns bbox [100, 200, 500, 600] on scaled image
  • AND scale_factor is 2.0 (image was scaled down by 0.5)
  • THEN final bbox is [200, 400, 1000, 1200] in original image coordinates