egg/OCR

Files

egg dda9621e17 feat: enhance layout preprocessing and unify image scaling proposal

Backend changes:
- Add image scaling configuration for PP-Structure processing
- Enhance layout preprocessing service with scaling support
- Update OCR service with improved memory management
- Add PP-Structure enhanced processing improvements

Frontend changes:
- Update preprocessing settings UI
- Fix processing page layout and state management
- Update API types for new parameters

Proposals:
- Archive add-layout-preprocessing proposal (completed)
- Add unify-image-scaling proposal for consistent coordinate handling

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-28 09:23:19 +08:00

2.9 KiB

Raw Blame History

Change: Unify Image Scaling Strategy for Optimal Layout Detection

Why

Currently, the system has inconsistent image resolution handling:

PDF conversion: Always uses 300 DPI, producing ~2480×3508 images for A4
Image downscaling: Only applied when image > 2000px (no upscaling)
Small images: Never scaled up, even if they're below optimal detection size

This inconsistency causes:

Wasted processing: PDF→300DPI→scale down to 1600px (double conversion)
Suboptimal detection: Small images stay small, missing table structures
Inconsistent behavior: Different source formats get different treatment

PP-Structure's layout detection model (RT-DETR based) works best with images around 1600px on the longest side. Both too-large and too-small images reduce detection accuracy.

What Changes

Bidirectional scaling for PP-Structure
- Scale DOWN images larger than max threshold (2000px) → target (1600px)
- Scale UP images smaller than min threshold (1200px) → target (1600px)
- No change for images in optimal range (1200-2000px)
PDF conversion DPI optimization
- Calculate optimal DPI based on target resolution
- Avoid double-scaling (convert at high DPI then scale down)
- Option to use adaptive DPI or fixed DPI with post-scaling
Unified scaling logic
- Same rules apply to all image sources (IMG, PDF pages)
- Scaling happens once at preprocessing stage
- Bbox coordinates scaled back to original for accurate cropping
Configuration
- layout_image_scaling_min_dimension: Minimum size before upscaling (default: 1200)
- Keep existing layout_image_scaling_max_dimension (2000) and target_dimension (1600)

Impact

Affected Specs

ocr-processing - Modified scaling requirements

Affected Code

backend/app/core/config.py - Add min_dimension setting
backend/app/services/layout_preprocessing_service.py - Add upscaling logic
backend/app/services/ocr_service.py - Optional: Adjust PDF DPI handling

Quality Impact

Scenario	Before	After
Large image (3000px)	Scaled to 1600px	Same
Optimal image (1500px)	No scaling	Same
Small image (800px)	No scaling	Scaled to 1600px
PDF at 300 DPI	2480px → 1600px	Same (or optimized DPI)

Raw OCR Impact

No change: Raw OCR continues to use original/converted images
Upscaling only affects PP-Structure layout detection input

Risks

Upscaling quality: Enlarging small images may introduce interpolation artifacts
- Mitigation: Use INTER_CUBIC or INTER_LANCZOS4 for upscaling
- Note: Layout detection cares about structure, not fine text detail
Memory for large upscaled images: Small image scaled up uses more memory
- Mitigation: 800px → 1600px is 4x pixels, but 1600px is still reasonable
Breaking existing behavior: Users may rely on current behavior
- Mitigation: Document the change, add config toggle if needed

2.9 KiB Raw Blame History Unescape Escape