# ocr-processing Specification Delta ## ADDED Requirements ### Requirement: Document Orientation Detection The system SHALL detect and correct document orientation for scanned PDFs where the content orientation differs from PDF page metadata. #### Scenario: Portrait PDF with landscape content is corrected - **GIVEN** a PDF with portrait page dimensions (width < height) - **AND** the scanned content is rotated 90° (landscape scan in portrait page) - **WHEN** PP-StructureV3 processes the image with `use_doc_orientation_classify=True` - **THEN** the system SHALL detect rotation angle as "90" or "270" - **AND** the output PDF page dimensions SHALL be swapped (width ↔ height) - **AND** all text elements SHALL be correctly positioned in the rotated coordinate space #### Scenario: Landscape PDF with portrait content is corrected - **GIVEN** a PDF with landscape page dimensions (width > height) - **AND** the scanned content is rotated 90° (portrait scan in landscape page) - **WHEN** PP-StructureV3 processes the image - **THEN** the system SHALL detect rotation angle as "90" or "270" - **AND** the output PDF page dimensions SHALL be swapped - **AND** all text elements SHALL be correctly positioned #### Scenario: Upside-down content is corrected - **GIVEN** a scanned document that is upside down (180° rotation) - **WHEN** PP-StructureV3 processes the image - **THEN** the system SHALL detect rotation angle as "180" - **AND** page dimensions SHALL NOT be swapped (orientation is same, just flipped) - **AND** text elements SHALL be correctly positioned after internal rotation #### Scenario: Correctly oriented documents remain unchanged - **GIVEN** a PDF where page metadata matches actual content orientation - **WHEN** PP-StructureV3 processes the image - **THEN** the system SHALL detect rotation angle as "0" - **AND** page dimensions SHALL remain unchanged - **AND** processing SHALL proceed normally without dimension adjustment #### Scenario: Rotation angle is captured from PP-StructureV3 results - **GIVEN** PP-StructureV3 is configured with `use_doc_orientation_classify=True` - **WHEN** processing completes - **THEN** the system SHALL extract rotation angle from `doc_preprocessor_res.label_names` - **AND** include `detected_rotation` in the OCR result metadata - **AND** log the detected rotation for debugging #### Scenario: Dimension adjustment happens before PDF generation - **GIVEN** OCR processing detects rotation angle of "90" or "270" - **WHEN** creating the UnifiedDocument for PDF generation - **THEN** the Page dimensions SHALL use adjusted (swapped) width and height - **AND** OCR coordinates SHALL be used directly (already in rotated space) - **AND** no additional coordinate transformation is needed ### Requirement: Orientation Detection Configuration The system SHALL provide configuration for enabling/disabling document orientation detection. #### Scenario: Orientation detection is enabled by default - **GIVEN** default configuration settings - **WHEN** OCR track processing runs - **THEN** `use_doc_orientation_classify` SHALL be `True` - **AND** PP-StructureV3 SHALL perform document orientation classification #### Scenario: Orientation detection can be disabled - **GIVEN** `use_doc_orientation_classify` is set to `False` in configuration - **WHEN** OCR track processing runs - **THEN** the system SHALL NOT perform orientation detection - **AND** page dimensions SHALL be based on original image dimensions - **AND** this maintains backward compatibility for controlled environments ## MODIFIED Requirements ### Requirement: Layout Model Selection (Modified) The system SHALL apply document orientation detection before layout detection regardless of the selected layout model. #### Scenario: Orientation detection works with all layout models - **GIVEN** a user selects any layout model (chinese, default, cdla) - **WHEN** OCR processing runs with `use_doc_orientation_classify=True` - **THEN** orientation detection SHALL be applied regardless of layout model choice - **AND** orientation detection happens in Stage 1 (preprocessing) before layout detection (Stage 3)