egg/OCR

Files

egg cfe65158a3 feat: enable document orientation detection for scanned PDFs

- Enable PP-StructureV3's use_doc_orientation_classify feature
- Detect rotation angle from doc_preprocessor_res.angle
- Swap page dimensions (width <-> height) for 90°/270° rotations
- Output PDF now correctly displays landscape-scanned content

Also includes:
- Archive completed openspec proposals
- Add simplify-frontend-ocr-config proposal (pending)
- Code cleanup and frontend simplification

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2025-12-11 17:13:46 +08:00

2.7 KiB

Raw Blame History

Enable Document Orientation Detection

Summary

Enable PP-StructureV3's document orientation classification feature to correctly handle PDF scans where the content orientation differs from the PDF page metadata.

Problem Statement

Currently, when a portrait-oriented PDF contains landscape-scanned content (or vice versa), the OCR system produces incorrect results because:

pdf2image extracts images based on PDF metadata (e.g., Page size: 1242 x 1755, Page rot: 0)
PP-StructureV3 has use_doc_orientation_classify=False (disabled)
The OCR attempts to read sideways text, resulting in poor recognition
The output PDF has wrong page dimensions

Example Scenario

Input: Portrait PDF (1242 x 1755) containing landscape-scanned delivery form
Current output: Portrait PDF with unreadable/incorrect text
Expected output: Landscape PDF (1755 x 1242) with correctly oriented text

Proposed Solution

Enable document orientation detection in PP-StructureV3 and adjust page dimensions based on the detected rotation:

Enable orientation detection: Set use_doc_orientation_classify=True in config
Capture rotation info: Extract the detected rotation angle (0°/90°/180°/270°) from PP-StructureV3 results
Adjust dimensions: When 90° or 270° rotation is detected, swap width and height for the output PDF
Use OCR coordinates directly: PP-StructureV3 returns coordinates based on the rotated image, so no coordinate transformation is needed

PP-StructureV3 Orientation Detection Details

According to PaddleOCR documentation:

Stage 1 preprocessing: use_doc_orientation_classify detects and rotates the entire page
Output format: doc_preprocessor_res contains:
- class_ids: [0-3] corresponding to [0°, 90°, 180°, 270°]
- label_names: ["0", "90", "180", "270"]
- scores: confidence scores
Model accuracy: PP-LCNet_x1_0_doc_ori achieves 99.06% top-1 accuracy

Scope

Backend only (no frontend changes required)
Affects OCR track processing
Does not affect Direct or Hybrid track

Risks and Mitigations

Risk	Mitigation
Model might incorrectly classify mixed-orientation pages	99.06% accuracy is acceptable; `use_textline_orientation` (already enabled) handles per-line correction
Coordinate mismatch in edge cases	Thorough testing with portrait, landscape, and mixed documents
Performance overhead	Orientation classification adds ~100ms per page (negligible vs total OCR time)

Success Criteria

Portrait PDF with landscape content produces landscape output PDF
Landscape PDF with portrait content produces portrait output PDF
Normal orientation documents continue to work correctly
Text recognition accuracy improves for rotated documents

2.7 KiB Raw Blame History