egg/OCR

Files

egg 63ffa8f0e3 docs: archive enable-doc-orientation-detection proposal

Feature implementation completed and tested successfully.
- PP-StructureV3 orientation detection enabled
- Page dimensions correctly swapped for 90°/270° rotations
- Output PDF now displays landscape content correctly

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2025-12-11 17:15:05 +08:00

4.0 KiB

Raw Blame History

ocr-processing Specification Delta

ADDED Requirements

Requirement: Document Orientation Detection

The system SHALL detect and correct document orientation for scanned PDFs where the content orientation differs from PDF page metadata.

Scenario: Portrait PDF with landscape content is corrected

GIVEN a PDF with portrait page dimensions (width < height)
AND the scanned content is rotated 90° (landscape scan in portrait page)
WHEN PP-StructureV3 processes the image with use_doc_orientation_classify=True
THEN the system SHALL detect rotation angle as "90" or "270"
AND the output PDF page dimensions SHALL be swapped (width ↔ height)
AND all text elements SHALL be correctly positioned in the rotated coordinate space

Scenario: Landscape PDF with portrait content is corrected

GIVEN a PDF with landscape page dimensions (width > height)
AND the scanned content is rotated 90° (portrait scan in landscape page)
WHEN PP-StructureV3 processes the image
THEN the system SHALL detect rotation angle as "90" or "270"
AND the output PDF page dimensions SHALL be swapped
AND all text elements SHALL be correctly positioned

Scenario: Upside-down content is corrected

GIVEN a scanned document that is upside down (180° rotation)
WHEN PP-StructureV3 processes the image
THEN the system SHALL detect rotation angle as "180"
AND page dimensions SHALL NOT be swapped (orientation is same, just flipped)
AND text elements SHALL be correctly positioned after internal rotation

Scenario: Correctly oriented documents remain unchanged

GIVEN a PDF where page metadata matches actual content orientation
WHEN PP-StructureV3 processes the image
THEN the system SHALL detect rotation angle as "0"
AND page dimensions SHALL remain unchanged
AND processing SHALL proceed normally without dimension adjustment

Scenario: Rotation angle is captured from PP-StructureV3 results

GIVEN PP-StructureV3 is configured with use_doc_orientation_classify=True
WHEN processing completes
THEN the system SHALL extract rotation angle from doc_preprocessor_res.label_names
AND include detected_rotation in the OCR result metadata
AND log the detected rotation for debugging

Scenario: Dimension adjustment happens before PDF generation

GIVEN OCR processing detects rotation angle of "90" or "270"
WHEN creating the UnifiedDocument for PDF generation
THEN the Page dimensions SHALL use adjusted (swapped) width and height
AND OCR coordinates SHALL be used directly (already in rotated space)
AND no additional coordinate transformation is needed

Requirement: Orientation Detection Configuration

The system SHALL provide configuration for enabling/disabling document orientation detection.

Scenario: Orientation detection is enabled by default

GIVEN default configuration settings
WHEN OCR track processing runs
THEN use_doc_orientation_classify SHALL be True
AND PP-StructureV3 SHALL perform document orientation classification

Scenario: Orientation detection can be disabled

GIVEN use_doc_orientation_classify is set to False in configuration
WHEN OCR track processing runs
THEN the system SHALL NOT perform orientation detection
AND page dimensions SHALL be based on original image dimensions
AND this maintains backward compatibility for controlled environments

MODIFIED Requirements

Requirement: Layout Model Selection (Modified)

The system SHALL apply document orientation detection before layout detection regardless of the selected layout model.

Scenario: Orientation detection works with all layout models

GIVEN a user selects any layout model (chinese, default, cdla)
WHEN OCR processing runs with use_doc_orientation_classify=True
THEN orientation detection SHALL be applied regardless of layout model choice
AND orientation detection happens in Stage 1 (preprocessing) before layout detection (Stage 3)

4.0 KiB Raw Blame History