Files
OCR/openspec/changes/archive/2025-12-11-enable-doc-orientation-detection/tasks.md
egg 63ffa8f0e3 docs: archive enable-doc-orientation-detection proposal
Feature implementation completed and tested successfully.
- PP-StructureV3 orientation detection enabled
- Page dimensions correctly swapped for 90°/270° rotations
- Output PDF now displays landscape content correctly

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 17:15:05 +08:00

2.8 KiB

Tasks

Phase 1: Enable Orientation Detection

  • Task 1.1: Enable use_doc_orientation_classify in config

    • File: backend/app/core/config.py
    • Change: Set use_doc_orientation_classify: bool = Field(default=True)
    • Update comment to reflect new behavior
  • Task 1.2: Capture rotation info from PP-StructureV3 results

    • File: backend/app/services/pp_structure_enhanced.py
    • Extract doc_preprocessor_res from PP-StructureV3 output
    • Parse label_names to get detected rotation angle
    • Pass rotation angle to caller

Phase 2: Dimension Adjustment

  • Task 2.1: Add rotation angle to OCR result

    • File: backend/app/services/ocr_service.py
    • Receive rotation angle from analyze_layout()
    • Include detected_rotation in result dict
  • Task 2.2: Adjust page dimensions based on rotation

    • File: backend/app/services/ocr_service.py
    • In process_image(), after getting ocr_width, ocr_height from PIL
    • If detected_rotation is "90" or "270", swap dimensions
    • Log dimension adjustment for debugging
  • Task 2.3: Pass adjusted dimensions to UnifiedDocument

    • File: backend/app/services/ocr_to_unified_converter.py
    • Verified: Page.dimensions uses the adjusted width/height from enhanced_results
    • No coordinate transformation needed (already based on rotated image)

Phase 3: Testing & Validation

  • Task 3.1: Test with portrait PDF containing landscape scan

    • Verify output PDF is landscape
    • Verify text is correctly oriented
    • Verify text positioning is accurate
  • Task 3.2: Test with landscape PDF containing portrait scan

    • Verify output PDF is portrait
    • Verify text is correctly oriented
  • Task 3.3: Test with correctly oriented documents

    • Verify no regression for normal documents
    • Both portrait and landscape normal scans
  • Task 3.4: Test edge cases

    • 180° rotated documents (upside down)
    • Documents with mixed text orientations

Dependencies

  • Task 1.1 and 1.2 can be done in parallel
  • Task 2.1 depends on Task 1.2
  • Task 2.2 depends on Task 2.1
  • Task 2.3 depends on Task 2.2
  • All Phase 3 tasks depend on Phase 2 completion

Implementation Summary

Files Modified:

  1. backend/app/core/config.py - Enabled use_doc_orientation_classify=True
  2. backend/app/services/pp_structure_enhanced.py - Extract and return detected_rotation
  3. backend/app/services/ocr_service.py - Adjust dimensions and add rotation to result

Key Changes:

  • PP-StructureV3 now detects document orientation (0°/90°/180°/270°)
  • When 90° or 270° rotation detected, page dimensions are swapped (width ↔ height)
  • detected_rotation is included in OCR result for debugging/logging
  • Coordinates from PP-StructureV3 are already in the rotated coordinate space