Backup commit before executing remove-unused-code proposal. This includes all pending changes and new features. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
7.0 KiB
7.0 KiB
MODIFIED Requirements
Requirement: OCR Track Gap Filling with Raw OCR Regions
The system SHALL detect and fill gaps in PP-StructureV3 output by supplementing with Raw OCR text regions when significant content loss is detected.
Scenario: Gap filling activates when coverage is low
- GIVEN an OCR track processing task
- WHEN PP-StructureV3 outputs elements that cover less than 70% of Raw OCR text regions
- THEN the system SHALL activate gap filling
- AND identify Raw OCR regions not covered by any PP-StructureV3 element
- AND supplement these regions as TEXT elements in the output
Scenario: Coverage is determined by IoA (Intersection over Area)
- GIVEN a Raw OCR text region with bounding box
- WHEN checking if the region is covered by PP-StructureV3
- THEN the region SHALL be considered covered if IoA (intersection area / OCR box area) exceeds the type-specific threshold
- AND IoA SHALL be used instead of IoU because it correctly measures "small box contained in large box" relationship
- AND regions not meeting the IoA criterion SHALL be marked as uncovered
Scenario: Element-type-specific IoA thresholds are applied
- GIVEN a Raw OCR region being evaluated for coverage
- WHEN comparing against PP-StructureV3 elements of different types
- THEN the system SHALL apply different IoA thresholds:
- TEXT, TITLE, HEADER, FOOTER: IoA > 0.6 (tolerates boundary errors)
- TABLE: IoA > 0.1 (strict filtering to preserve table structure)
- FIGURE, IMAGE: IoA > 0.8 (preserves text within figures like axis labels)
- AND a region is considered covered if it meets the threshold for ANY overlapping element
Scenario: Only TEXT elements are supplemented
- GIVEN uncovered Raw OCR regions identified for supplementation
- WHEN PP-StructureV3 has detected TABLE, IMAGE, FIGURE, FLOWCHART, HEADER, or FOOTER elements
- THEN the system SHALL NOT supplement regions that overlap with these structural elements
- AND only supplement regions as TEXT type to preserve structural integrity
Scenario: Supplemented regions meet confidence threshold
- GIVEN Raw OCR regions to be supplemented
- WHEN a region has confidence score below 0.3
- THEN the system SHALL skip that region
- AND only supplement regions with confidence >= 0.3
Scenario: Deduplication uses IoA instead of IoU
- GIVEN a Raw OCR region being considered for supplementation
- WHEN the region has IoA > 0.5 with any existing PP-StructureV3 TEXT element
- THEN the system SHALL skip that region to prevent duplicate text
- AND the original PP-StructureV3 element SHALL be preserved
Scenario: Reading order is recalculated after gap filling
- GIVEN supplemented elements have been added to the page
- WHEN assembling the final element list
- THEN the system SHALL recalculate reading order for the entire page
- AND sort elements by y0 coordinate (top to bottom) then x0 (left to right)
- AND ensure logical document flow is maintained
Scenario: Coordinate alignment with ocr_dimensions
- GIVEN Raw OCR processing may involve image resizing
- WHEN comparing Raw OCR bbox with PP-StructureV3 bbox
- THEN the system SHALL use ocr_dimensions to normalize coordinates
- AND ensure both sources reference the same coordinate space
- AND prevent coverage misdetection due to scale differences
Scenario: Supplemented elements have complete metadata
- GIVEN a Raw OCR region being added as supplemented element
- WHEN creating the DocumentElement
- THEN the element SHALL include page_number
- AND include confidence score from Raw OCR
- AND include original bbox coordinates
- AND optionally include source indicator for debugging
Requirement: Gap Filling Configuration
The system SHALL provide configurable parameters for gap filling behavior.
Scenario: Gap filling can be disabled via configuration
- GIVEN gap_filling_enabled is set to false in configuration
- WHEN OCR track processing runs
- THEN the system SHALL skip all gap filling logic
- AND output only PP-StructureV3 results as before
Scenario: Coverage threshold is configurable
- GIVEN gap_filling_coverage_threshold is set to 0.8
- WHEN PP-StructureV3 coverage is 75%
- THEN the system SHALL activate gap filling
- AND supplement uncovered regions
Scenario: IoA thresholds are configurable per element type
- GIVEN custom IoA thresholds configured:
- gap_filling_ioa_threshold_text: 0.6
- gap_filling_ioa_threshold_table: 0.1
- gap_filling_ioa_threshold_figure: 0.8
- gap_filling_dedup_ioa_threshold: 0.5
- WHEN evaluating coverage and deduplication
- THEN the system SHALL use the configured values
- AND apply them consistently throughout gap filling process
Scenario: Confidence threshold is configurable
- GIVEN gap_filling_confidence_threshold is set to 0.5
- WHEN supplementing Raw OCR regions
- THEN the system SHALL only include regions with confidence >= 0.5
- AND filter out lower confidence regions
Scenario: Boundary shrinking reduces edge duplicates
- GIVEN gap_filling_shrink_pixels is set to 1
- WHEN evaluating coverage with IoA
- THEN the system SHALL shrink OCR bounding boxes inward by 1 pixel on each side
- AND this reduces false "uncovered" detection at region boundaries
ADDED Requirements
Requirement: Use PP-StructureV3 Internal OCR Results
The system SHALL preferentially use PP-StructureV3's internal OCR results (overall_ocr_res) instead of running a separate Raw OCR inference.
Scenario: Extract overall_ocr_res from PP-StructureV3
- GIVEN PP-StructureV3 processing completes
- WHEN the result contains
json['res']['overall_ocr_res'] - THEN the system SHALL extract OCR regions from:
dt_polys: detection box polygonsrec_texts: recognized text stringsrec_scores: confidence scores
- AND convert these to the standard TextRegion format for gap filling
Scenario: Skip separate Raw OCR when overall_ocr_res is available
- GIVEN gap_filling_use_overall_ocr is true (default)
- WHEN PP-StructureV3 result contains overall_ocr_res
- THEN the system SHALL NOT execute separate PaddleOCR inference
- AND use the extracted overall_ocr_res as the OCR source
- AND this reduces total inference time by approximately 50%
Scenario: Fallback to separate Raw OCR when needed
- GIVEN gap_filling_use_overall_ocr is false OR overall_ocr_res is missing
- WHEN gap filling is activated
- THEN the system SHALL execute separate PaddleOCR inference as before
- AND use the separate OCR results for gap filling
- AND this maintains backward compatibility
Scenario: Coordinate consistency is guaranteed
- GIVEN overall_ocr_res is extracted from PP-StructureV3
- WHEN comparing with PP-StructureV3 layout elements
- THEN both SHALL use the same coordinate system
- AND no additional coordinate alignment is needed
- AND this prevents scale mismatch issues