- Enable PP-StructureV3's use_doc_orientation_classify feature - Detect rotation angle from doc_preprocessor_res.angle - Swap page dimensions (width <-> height) for 90°/270° rotations - Output PDF now correctly displays landscape-scanned content Also includes: - Archive completed openspec proposals - Add simplify-frontend-ocr-config proposal (pending) - Code cleanup and frontend simplification 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
4.0 KiB
4.0 KiB
ocr-processing Specification Delta
ADDED Requirements
Requirement: Document Orientation Detection
The system SHALL detect and correct document orientation for scanned PDFs where the content orientation differs from PDF page metadata.
Scenario: Portrait PDF with landscape content is corrected
- GIVEN a PDF with portrait page dimensions (width < height)
- AND the scanned content is rotated 90° (landscape scan in portrait page)
- WHEN PP-StructureV3 processes the image with
use_doc_orientation_classify=True - THEN the system SHALL detect rotation angle as "90" or "270"
- AND the output PDF page dimensions SHALL be swapped (width ↔ height)
- AND all text elements SHALL be correctly positioned in the rotated coordinate space
Scenario: Landscape PDF with portrait content is corrected
- GIVEN a PDF with landscape page dimensions (width > height)
- AND the scanned content is rotated 90° (portrait scan in landscape page)
- WHEN PP-StructureV3 processes the image
- THEN the system SHALL detect rotation angle as "90" or "270"
- AND the output PDF page dimensions SHALL be swapped
- AND all text elements SHALL be correctly positioned
Scenario: Upside-down content is corrected
- GIVEN a scanned document that is upside down (180° rotation)
- WHEN PP-StructureV3 processes the image
- THEN the system SHALL detect rotation angle as "180"
- AND page dimensions SHALL NOT be swapped (orientation is same, just flipped)
- AND text elements SHALL be correctly positioned after internal rotation
Scenario: Correctly oriented documents remain unchanged
- GIVEN a PDF where page metadata matches actual content orientation
- WHEN PP-StructureV3 processes the image
- THEN the system SHALL detect rotation angle as "0"
- AND page dimensions SHALL remain unchanged
- AND processing SHALL proceed normally without dimension adjustment
Scenario: Rotation angle is captured from PP-StructureV3 results
- GIVEN PP-StructureV3 is configured with
use_doc_orientation_classify=True - WHEN processing completes
- THEN the system SHALL extract rotation angle from
doc_preprocessor_res.label_names - AND include
detected_rotationin the OCR result metadata - AND log the detected rotation for debugging
Scenario: Dimension adjustment happens before PDF generation
- GIVEN OCR processing detects rotation angle of "90" or "270"
- WHEN creating the UnifiedDocument for PDF generation
- THEN the Page dimensions SHALL use adjusted (swapped) width and height
- AND OCR coordinates SHALL be used directly (already in rotated space)
- AND no additional coordinate transformation is needed
Requirement: Orientation Detection Configuration
The system SHALL provide configuration for enabling/disabling document orientation detection.
Scenario: Orientation detection is enabled by default
- GIVEN default configuration settings
- WHEN OCR track processing runs
- THEN
use_doc_orientation_classifySHALL beTrue - AND PP-StructureV3 SHALL perform document orientation classification
Scenario: Orientation detection can be disabled
- GIVEN
use_doc_orientation_classifyis set toFalsein configuration - WHEN OCR track processing runs
- THEN the system SHALL NOT perform orientation detection
- AND page dimensions SHALL be based on original image dimensions
- AND this maintains backward compatibility for controlled environments
MODIFIED Requirements
Requirement: Layout Model Selection (Modified)
The system SHALL apply document orientation detection before layout detection regardless of the selected layout model.
Scenario: Orientation detection works with all layout models
- GIVEN a user selects any layout model (chinese, default, cdla)
- WHEN OCR processing runs with
use_doc_orientation_classify=True - THEN orientation detection SHALL be applied regardless of layout model choice
- AND orientation detection happens in Stage 1 (preprocessing) before layout detection (Stage 3)