Files
egg 63ffa8f0e3 docs: archive enable-doc-orientation-detection proposal
Feature implementation completed and tested successfully.
- PP-StructureV3 orientation detection enabled
- Page dimensions correctly swapped for 90°/270° rotations
- Output PDF now displays landscape content correctly

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 17:15:05 +08:00

4.0 KiB

ocr-processing Specification Delta

ADDED Requirements

Requirement: Document Orientation Detection

The system SHALL detect and correct document orientation for scanned PDFs where the content orientation differs from PDF page metadata.

Scenario: Portrait PDF with landscape content is corrected

  • GIVEN a PDF with portrait page dimensions (width < height)
  • AND the scanned content is rotated 90° (landscape scan in portrait page)
  • WHEN PP-StructureV3 processes the image with use_doc_orientation_classify=True
  • THEN the system SHALL detect rotation angle as "90" or "270"
  • AND the output PDF page dimensions SHALL be swapped (width ↔ height)
  • AND all text elements SHALL be correctly positioned in the rotated coordinate space

Scenario: Landscape PDF with portrait content is corrected

  • GIVEN a PDF with landscape page dimensions (width > height)
  • AND the scanned content is rotated 90° (portrait scan in landscape page)
  • WHEN PP-StructureV3 processes the image
  • THEN the system SHALL detect rotation angle as "90" or "270"
  • AND the output PDF page dimensions SHALL be swapped
  • AND all text elements SHALL be correctly positioned

Scenario: Upside-down content is corrected

  • GIVEN a scanned document that is upside down (180° rotation)
  • WHEN PP-StructureV3 processes the image
  • THEN the system SHALL detect rotation angle as "180"
  • AND page dimensions SHALL NOT be swapped (orientation is same, just flipped)
  • AND text elements SHALL be correctly positioned after internal rotation

Scenario: Correctly oriented documents remain unchanged

  • GIVEN a PDF where page metadata matches actual content orientation
  • WHEN PP-StructureV3 processes the image
  • THEN the system SHALL detect rotation angle as "0"
  • AND page dimensions SHALL remain unchanged
  • AND processing SHALL proceed normally without dimension adjustment

Scenario: Rotation angle is captured from PP-StructureV3 results

  • GIVEN PP-StructureV3 is configured with use_doc_orientation_classify=True
  • WHEN processing completes
  • THEN the system SHALL extract rotation angle from doc_preprocessor_res.label_names
  • AND include detected_rotation in the OCR result metadata
  • AND log the detected rotation for debugging

Scenario: Dimension adjustment happens before PDF generation

  • GIVEN OCR processing detects rotation angle of "90" or "270"
  • WHEN creating the UnifiedDocument for PDF generation
  • THEN the Page dimensions SHALL use adjusted (swapped) width and height
  • AND OCR coordinates SHALL be used directly (already in rotated space)
  • AND no additional coordinate transformation is needed

Requirement: Orientation Detection Configuration

The system SHALL provide configuration for enabling/disabling document orientation detection.

Scenario: Orientation detection is enabled by default

  • GIVEN default configuration settings
  • WHEN OCR track processing runs
  • THEN use_doc_orientation_classify SHALL be True
  • AND PP-StructureV3 SHALL perform document orientation classification

Scenario: Orientation detection can be disabled

  • GIVEN use_doc_orientation_classify is set to False in configuration
  • WHEN OCR track processing runs
  • THEN the system SHALL NOT perform orientation detection
  • AND page dimensions SHALL be based on original image dimensions
  • AND this maintains backward compatibility for controlled environments

MODIFIED Requirements

Requirement: Layout Model Selection (Modified)

The system SHALL apply document orientation detection before layout detection regardless of the selected layout model.

Scenario: Orientation detection works with all layout models

  • GIVEN a user selects any layout model (chinese, default, cdla)
  • WHEN OCR processing runs with use_doc_orientation_classify=True
  • THEN orientation detection SHALL be applied regardless of layout model choice
  • AND orientation detection happens in Stage 1 (preprocessing) before layout detection (Stage 3)