feat: enable document orientation detection for scanned PDFs

- Enable PP-StructureV3's use_doc_orientation_classify feature
- Detect rotation angle from doc_preprocessor_res.angle
- Swap page dimensions (width <-> height) for 90°/270° rotations
- Output PDF now correctly displays landscape-scanned content

Also includes:
- Archive completed openspec proposals
- Add simplify-frontend-ocr-config proposal (pending)
- Code cleanup and frontend simplification

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
egg
2025-12-11 17:13:46 +08:00
parent 57070af307
commit cfe65158a3
58 changed files with 1271 additions and 3048 deletions

View File

@@ -14,12 +14,21 @@ The system SHALL detect and fill gaps in PP-StructureV3 output by supplementing
- **AND** identify Raw OCR regions not covered by any PP-StructureV3 element
- **AND** supplement these regions as TEXT elements in the output
#### Scenario: Coverage is determined by center-point and IoU
#### Scenario: Coverage is determined by IoA (Intersection over Area)
- **GIVEN** a Raw OCR text region with bounding box
- **WHEN** checking if the region is covered by PP-StructureV3
- **THEN** the region SHALL be considered covered if its center point falls inside any PP-StructureV3 element bbox
- **OR** if IoU with any PP-StructureV3 element exceeds 0.15 threshold
- **AND** regions not meeting either criterion SHALL be marked as uncovered
- **THEN** the region SHALL be considered covered if IoA (intersection area / OCR box area) exceeds the type-specific threshold
- **AND** IoA SHALL be used instead of IoU because it correctly measures "small box contained in large box" relationship
- **AND** regions not meeting the IoA criterion SHALL be marked as uncovered
#### Scenario: Element-type-specific IoA thresholds are applied
- **GIVEN** a Raw OCR region being evaluated for coverage
- **WHEN** comparing against PP-StructureV3 elements of different types
- **THEN** the system SHALL apply different IoA thresholds:
- TEXT, TITLE, HEADER, FOOTER: IoA > 0.6 (tolerates boundary errors)
- TABLE: IoA > 0.1 (strict filtering to preserve table structure)
- FIGURE, IMAGE: IoA > 0.8 (preserves text within figures like axis labels)
- **AND** a region is considered covered if it meets the threshold for ANY overlapping element
#### Scenario: Only TEXT elements are supplemented
- **GIVEN** uncovered Raw OCR regions identified for supplementation
@@ -33,9 +42,9 @@ The system SHALL detect and fill gaps in PP-StructureV3 output by supplementing
- **THEN** the system SHALL skip that region
- **AND** only supplement regions with confidence >= 0.3
#### Scenario: Deduplication prevents repeated text
#### Scenario: Deduplication uses IoA instead of IoU
- **GIVEN** a Raw OCR region being considered for supplementation
- **WHEN** the region has IoU > 0.5 with any existing PP-StructureV3 TEXT element
- **WHEN** the region has IoA > 0.5 with any existing PP-StructureV3 TEXT element
- **THEN** the system SHALL skip that region to prevent duplicate text
- **AND** the original PP-StructureV3 element SHALL be preserved
@@ -99,10 +108,12 @@ The system SHALL provide configurable parameters for gap filling behavior.
- **THEN** the system SHALL activate gap filling
- **AND** supplement uncovered regions
#### Scenario: IoU thresholds are configurable
- **GIVEN** custom IoU thresholds configured:
- gap_filling_iou_threshold: 0.2
- gap_filling_dedup_iou_threshold: 0.6
#### Scenario: IoA thresholds are configurable per element type
- **GIVEN** custom IoA thresholds configured:
- gap_filling_ioa_threshold_text: 0.6
- gap_filling_ioa_threshold_table: 0.1
- gap_filling_ioa_threshold_figure: 0.8
- gap_filling_dedup_ioa_threshold: 0.5
- **WHEN** evaluating coverage and deduplication
- **THEN** the system SHALL use the configured values
- **AND** apply them consistently throughout gap filling process
@@ -113,6 +124,12 @@ The system SHALL provide configurable parameters for gap filling behavior.
- **THEN** the system SHALL only include regions with confidence >= 0.5
- **AND** filter out lower confidence regions
#### Scenario: Boundary shrinking reduces edge duplicates
- **GIVEN** gap_filling_shrink_pixels is set to 1
- **WHEN** evaluating coverage with IoA
- **THEN** the system SHALL shrink OCR bounding boxes inward by 1 pixel on each side
- **AND** this reduces false "uncovered" detection at region boundaries
### Requirement: Layout Model Selection
The system SHALL allow users to select a layout detection model optimized for their document type, providing a simple choice between pre-configured models instead of manual parameter tuning.
@@ -258,3 +275,37 @@ The system SHALL provide configurable thresholds for cell validation.
- **THEN** the system SHALL use the custom values
- **AND** apply them consistently to all pages
### Requirement: Use PP-StructureV3 Internal OCR Results
The system SHALL preferentially use PP-StructureV3's internal OCR results (`overall_ocr_res`) instead of running a separate Raw OCR inference.
#### Scenario: Extract overall_ocr_res from PP-StructureV3
- **GIVEN** PP-StructureV3 processing completes
- **WHEN** the result contains `json['res']['overall_ocr_res']`
- **THEN** the system SHALL extract OCR regions from:
- `dt_polys`: detection box polygons
- `rec_texts`: recognized text strings
- `rec_scores`: confidence scores
- **AND** convert these to the standard TextRegion format for gap filling
#### Scenario: Skip separate Raw OCR when overall_ocr_res is available
- **GIVEN** gap_filling_use_overall_ocr is true (default)
- **WHEN** PP-StructureV3 result contains overall_ocr_res
- **THEN** the system SHALL NOT execute separate PaddleOCR inference
- **AND** use the extracted overall_ocr_res as the OCR source
- **AND** this reduces total inference time by approximately 50%
#### Scenario: Fallback to separate Raw OCR when needed
- **GIVEN** gap_filling_use_overall_ocr is false OR overall_ocr_res is missing
- **WHEN** gap filling is activated
- **THEN** the system SHALL execute separate PaddleOCR inference as before
- **AND** use the separate OCR results for gap filling
- **AND** this maintains backward compatibility
#### Scenario: Coordinate consistency is guaranteed
- **GIVEN** overall_ocr_res is extracted from PP-StructureV3
- **WHEN** comparing with PP-StructureV3 layout elements
- **THEN** both SHALL use the same coordinate system
- **AND** no additional coordinate alignment is needed
- **AND** this prevents scale mismatch issues