Files
OCR/openspec/changes/unify-image-scaling/specs/ocr-processing/spec.md
egg dda9621e17 feat: enhance layout preprocessing and unify image scaling proposal
Backend changes:
- Add image scaling configuration for PP-Structure processing
- Enhance layout preprocessing service with scaling support
- Update OCR service with improved memory management
- Add PP-Structure enhanced processing improvements

Frontend changes:
- Update preprocessing settings UI
- Fix processing page layout and state management
- Update API types for new parameters

Proposals:
- Archive add-layout-preprocessing proposal (completed)
- Add unify-image-scaling proposal for consistent coordinate handling

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 09:23:19 +08:00

43 lines
2.0 KiB
Markdown

## MODIFIED Requirements
### Requirement: Image Scaling for Layout Detection
The system SHALL apply bidirectional image scaling to optimize PP-Structure layout detection accuracy:
1. Images with longest side > `layout_image_scaling_max_dimension` (default: 2000px) SHALL be scaled DOWN to `layout_image_scaling_target_dimension` (default: 1600px)
2. Images with longest side < `layout_image_scaling_min_dimension` (default: 1200px) SHALL be scaled UP to `layout_image_scaling_target_dimension` (default: 1600px)
3. Images within the optimal range (min_dimension to max_dimension) SHALL NOT be scaled
4. For downscaling, the system SHALL use `cv2.INTER_AREA` interpolation (best for shrinking)
5. For upscaling, the system SHALL use `cv2.INTER_CUBIC` interpolation (smooth enlargement)
6. The system SHALL track the scale factor and restore bounding box coordinates to original image space after layout detection
7. Raw OCR and element extraction SHALL continue to use original/unscaled images
#### Scenario: Large image is scaled down
- **WHEN** an image has max dimension 2480px (> 2000px threshold)
- **THEN** the image is scaled down to ~1600px on longest side
- **AND** scale_factor is recorded as ~2.19 for bbox restoration
- **AND** INTER_AREA interpolation is used
#### Scenario: Small image is scaled up
- **WHEN** an image has max dimension 800px (< 1200px threshold)
- **THEN** the image is scaled up to ~1600px on longest side
- **AND** scale_factor is recorded as ~0.5 for bbox restoration
- **AND** INTER_CUBIC interpolation is used
#### Scenario: Optimal size image is not scaled
- **WHEN** an image has max dimension 1500px (within 1200-2000px range)
- **THEN** the image is NOT scaled
- **AND** scale_factor is 1.0
- **AND** was_scaled is False
#### Scenario: Bbox coordinates are restored after scaling
- **WHEN** layout detection returns bbox [100, 200, 500, 600] on scaled image
- **AND** scale_factor is 2.0 (image was scaled down by 0.5)
- **THEN** final bbox is [200, 400, 1000, 1200] in original image coordinates