Files
OCR/openspec/changes/archive/2025-12-11-enable-doc-orientation-detection/proposal.md
egg 63ffa8f0e3 docs: archive enable-doc-orientation-detection proposal
Feature implementation completed and tested successfully.
- PP-StructureV3 orientation detection enabled
- Page dimensions correctly swapped for 90°/270° rotations
- Output PDF now displays landscape content correctly

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 17:15:05 +08:00

53 lines
2.7 KiB
Markdown

# Enable Document Orientation Detection
## Summary
Enable PP-StructureV3's document orientation classification feature to correctly handle PDF scans where the content orientation differs from the PDF page metadata.
## Problem Statement
Currently, when a portrait-oriented PDF contains landscape-scanned content (or vice versa), the OCR system produces incorrect results because:
1. **pdf2image** extracts images based on PDF metadata (e.g., `Page size: 1242 x 1755`, `Page rot: 0`)
2. **PP-StructureV3** has `use_doc_orientation_classify=False` (disabled)
3. The OCR attempts to read sideways text, resulting in poor recognition
4. The output PDF has wrong page dimensions
### Example Scenario
- Input: Portrait PDF (1242 x 1755) containing landscape-scanned delivery form
- Current output: Portrait PDF with unreadable/incorrect text
- Expected output: Landscape PDF (1755 x 1242) with correctly oriented text
## Proposed Solution
Enable document orientation detection in PP-StructureV3 and adjust page dimensions based on the detected rotation:
1. **Enable orientation detection**: Set `use_doc_orientation_classify=True` in config
2. **Capture rotation info**: Extract the detected rotation angle (0°/90°/180°/270°) from PP-StructureV3 results
3. **Adjust dimensions**: When 90° or 270° rotation is detected, swap width and height for the output PDF
4. **Use OCR coordinates directly**: PP-StructureV3 returns coordinates based on the rotated image, so no coordinate transformation is needed
## PP-StructureV3 Orientation Detection Details
According to PaddleOCR documentation:
- **Stage 1 preprocessing**: `use_doc_orientation_classify` detects and rotates the entire page
- **Output format**: `doc_preprocessor_res` contains:
- `class_ids`: [0-3] corresponding to [0°, 90°, 180°, 270°]
- `label_names`: ["0", "90", "180", "270"]
- `scores`: confidence scores
- **Model accuracy**: PP-LCNet_x1_0_doc_ori achieves 99.06% top-1 accuracy
## Scope
- Backend only (no frontend changes required)
- Affects OCR track processing
- Does not affect Direct or Hybrid track
## Risks and Mitigations
| Risk | Mitigation |
|------|------------|
| Model might incorrectly classify mixed-orientation pages | 99.06% accuracy is acceptable; `use_textline_orientation` (already enabled) handles per-line correction |
| Coordinate mismatch in edge cases | Thorough testing with portrait, landscape, and mixed documents |
| Performance overhead | Orientation classification adds ~100ms per page (negligible vs total OCR time) |
## Success Criteria
1. Portrait PDF with landscape content produces landscape output PDF
2. Landscape PDF with portrait content produces portrait output PDF
3. Normal orientation documents continue to work correctly
4. Text recognition accuracy improves for rotated documents