feat: enhance layout preprocessing and unify image scaling proposal
Backend changes: - Add image scaling configuration for PP-Structure processing - Enhance layout preprocessing service with scaling support - Update OCR service with improved memory management - Add PP-Structure enhanced processing improvements Frontend changes: - Update preprocessing settings UI - Fix processing page layout and state management - Update API types for new parameters Proposals: - Archive add-layout-preprocessing proposal (completed) - Add unify-image-scaling proposal for consistent coordinate handling 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,42 @@
|
||||
## MODIFIED Requirements
|
||||
|
||||
### Requirement: Image Scaling for Layout Detection
|
||||
|
||||
The system SHALL apply bidirectional image scaling to optimize PP-Structure layout detection accuracy:
|
||||
|
||||
1. Images with longest side > `layout_image_scaling_max_dimension` (default: 2000px) SHALL be scaled DOWN to `layout_image_scaling_target_dimension` (default: 1600px)
|
||||
|
||||
2. Images with longest side < `layout_image_scaling_min_dimension` (default: 1200px) SHALL be scaled UP to `layout_image_scaling_target_dimension` (default: 1600px)
|
||||
|
||||
3. Images within the optimal range (min_dimension to max_dimension) SHALL NOT be scaled
|
||||
|
||||
4. For downscaling, the system SHALL use `cv2.INTER_AREA` interpolation (best for shrinking)
|
||||
|
||||
5. For upscaling, the system SHALL use `cv2.INTER_CUBIC` interpolation (smooth enlargement)
|
||||
|
||||
6. The system SHALL track the scale factor and restore bounding box coordinates to original image space after layout detection
|
||||
|
||||
7. Raw OCR and element extraction SHALL continue to use original/unscaled images
|
||||
|
||||
#### Scenario: Large image is scaled down
|
||||
- **WHEN** an image has max dimension 2480px (> 2000px threshold)
|
||||
- **THEN** the image is scaled down to ~1600px on longest side
|
||||
- **AND** scale_factor is recorded as ~2.19 for bbox restoration
|
||||
- **AND** INTER_AREA interpolation is used
|
||||
|
||||
#### Scenario: Small image is scaled up
|
||||
- **WHEN** an image has max dimension 800px (< 1200px threshold)
|
||||
- **THEN** the image is scaled up to ~1600px on longest side
|
||||
- **AND** scale_factor is recorded as ~0.5 for bbox restoration
|
||||
- **AND** INTER_CUBIC interpolation is used
|
||||
|
||||
#### Scenario: Optimal size image is not scaled
|
||||
- **WHEN** an image has max dimension 1500px (within 1200-2000px range)
|
||||
- **THEN** the image is NOT scaled
|
||||
- **AND** scale_factor is 1.0
|
||||
- **AND** was_scaled is False
|
||||
|
||||
#### Scenario: Bbox coordinates are restored after scaling
|
||||
- **WHEN** layout detection returns bbox [100, 200, 500, 600] on scaled image
|
||||
- **AND** scale_factor is 2.0 (image was scaled down by 0.5)
|
||||
- **THEN** final bbox is [200, 400, 1000, 1200] in original image coordinates
|
||||
Reference in New Issue
Block a user