Files
OCR/openspec/changes/enable-doc-orientation-detection/proposal.md
egg cfe65158a3 feat: enable document orientation detection for scanned PDFs
- Enable PP-StructureV3's use_doc_orientation_classify feature
- Detect rotation angle from doc_preprocessor_res.angle
- Swap page dimensions (width <-> height) for 90°/270° rotations
- Output PDF now correctly displays landscape-scanned content

Also includes:
- Archive completed openspec proposals
- Add simplify-frontend-ocr-config proposal (pending)
- Code cleanup and frontend simplification

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 17:13:46 +08:00

2.7 KiB

Enable Document Orientation Detection

Summary

Enable PP-StructureV3's document orientation classification feature to correctly handle PDF scans where the content orientation differs from the PDF page metadata.

Problem Statement

Currently, when a portrait-oriented PDF contains landscape-scanned content (or vice versa), the OCR system produces incorrect results because:

  1. pdf2image extracts images based on PDF metadata (e.g., Page size: 1242 x 1755, Page rot: 0)
  2. PP-StructureV3 has use_doc_orientation_classify=False (disabled)
  3. The OCR attempts to read sideways text, resulting in poor recognition
  4. The output PDF has wrong page dimensions

Example Scenario

  • Input: Portrait PDF (1242 x 1755) containing landscape-scanned delivery form
  • Current output: Portrait PDF with unreadable/incorrect text
  • Expected output: Landscape PDF (1755 x 1242) with correctly oriented text

Proposed Solution

Enable document orientation detection in PP-StructureV3 and adjust page dimensions based on the detected rotation:

  1. Enable orientation detection: Set use_doc_orientation_classify=True in config
  2. Capture rotation info: Extract the detected rotation angle (0°/90°/180°/270°) from PP-StructureV3 results
  3. Adjust dimensions: When 90° or 270° rotation is detected, swap width and height for the output PDF
  4. Use OCR coordinates directly: PP-StructureV3 returns coordinates based on the rotated image, so no coordinate transformation is needed

PP-StructureV3 Orientation Detection Details

According to PaddleOCR documentation:

  • Stage 1 preprocessing: use_doc_orientation_classify detects and rotates the entire page
  • Output format: doc_preprocessor_res contains:
    • class_ids: [0-3] corresponding to [0°, 90°, 180°, 270°]
    • label_names: ["0", "90", "180", "270"]
    • scores: confidence scores
  • Model accuracy: PP-LCNet_x1_0_doc_ori achieves 99.06% top-1 accuracy

Scope

  • Backend only (no frontend changes required)
  • Affects OCR track processing
  • Does not affect Direct or Hybrid track

Risks and Mitigations

Risk Mitigation
Model might incorrectly classify mixed-orientation pages 99.06% accuracy is acceptable; use_textline_orientation (already enabled) handles per-line correction
Coordinate mismatch in edge cases Thorough testing with portrait, landscape, and mixed documents
Performance overhead Orientation classification adds ~100ms per page (negligible vs total OCR time)

Success Criteria

  1. Portrait PDF with landscape content produces landscape output PDF
  2. Landscape PDF with portrait content produces portrait output PDF
  3. Normal orientation documents continue to work correctly
  4. Text recognition accuracy improves for rotated documents