egg/OCR

Files

egg 63ffa8f0e3 docs: archive enable-doc-orientation-detection proposal

Feature implementation completed and tested successfully.
- PP-StructureV3 orientation detection enabled
- Page dimensions correctly swapped for 90°/270° rotations
- Output PDF now displays landscape content correctly

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2025-12-11 17:15:05 +08:00

2.8 KiB

Raw Blame History

Tasks

Phase 1: Enable Orientation Detection

Task 1.1: Enable use_doc_orientation_classify in config
- File: backend/app/core/config.py
- Change: Set use_doc_orientation_classify: bool = Field(default=True)
- Update comment to reflect new behavior
Task 1.2: Capture rotation info from PP-StructureV3 results
- File: backend/app/services/pp_structure_enhanced.py
- Extract doc_preprocessor_res from PP-StructureV3 output
- Parse label_names to get detected rotation angle
- Pass rotation angle to caller

Phase 2: Dimension Adjustment

Task 2.1: Add rotation angle to OCR result
- File: backend/app/services/ocr_service.py
- Receive rotation angle from analyze_layout()
- Include detected_rotation in result dict
Task 2.2: Adjust page dimensions based on rotation
- File: backend/app/services/ocr_service.py
- In process_image(), after getting ocr_width, ocr_height from PIL
- If detected_rotation is "90" or "270", swap dimensions
- Log dimension adjustment for debugging
Task 2.3: Pass adjusted dimensions to UnifiedDocument
- File: backend/app/services/ocr_to_unified_converter.py
- Verified: Page.dimensions uses the adjusted width/height from enhanced_results
- No coordinate transformation needed (already based on rotated image)

Phase 3: Testing & Validation

Task 3.1: Test with portrait PDF containing landscape scan
- Verify output PDF is landscape
- Verify text is correctly oriented
- Verify text positioning is accurate
Task 3.2: Test with landscape PDF containing portrait scan
- Verify output PDF is portrait
- Verify text is correctly oriented
Task 3.3: Test with correctly oriented documents
- Verify no regression for normal documents
- Both portrait and landscape normal scans
Task 3.4: Test edge cases
- 180° rotated documents (upside down)
- Documents with mixed text orientations

Dependencies

Task 1.1 and 1.2 can be done in parallel
Task 2.1 depends on Task 1.2
Task 2.2 depends on Task 2.1
Task 2.3 depends on Task 2.2
All Phase 3 tasks depend on Phase 2 completion

Implementation Summary

Files Modified:

backend/app/core/config.py - Enabled use_doc_orientation_classify=True
backend/app/services/pp_structure_enhanced.py - Extract and return detected_rotation
backend/app/services/ocr_service.py - Adjust dimensions and add rotation to result

Key Changes:

PP-StructureV3 now detects document orientation (0°/90°/180°/270°)
When 90° or 270° rotation detected, page dimensions are swapped (width ↔ height)
detected_rotation is included in OCR result for debugging/logging
Coordinates from PP-StructureV3 are already in the rotated coordinate space

2.8 KiB Raw Blame History

Tasks

Phase 1: Enable Orientation Detection

Phase 2: Dimension Adjustment

Phase 3: Testing & Validation

Dependencies

Implementation Summary

Files Modified:

Key Changes:

2.8 KiB

Raw Blame History