## MODIFIED Requirements ### Requirement: OCR Track Gap Filling with Raw OCR Regions The system SHALL detect and fill gaps in PP-StructureV3 output by supplementing with Raw OCR text regions when significant content loss is detected. #### Scenario: Gap filling activates when coverage is low - **GIVEN** an OCR track processing task - **WHEN** PP-StructureV3 outputs elements that cover less than 70% of Raw OCR text regions - **THEN** the system SHALL activate gap filling - **AND** identify Raw OCR regions not covered by any PP-StructureV3 element - **AND** supplement these regions as TEXT elements in the output #### Scenario: Coverage is determined by IoA (Intersection over Area) - **GIVEN** a Raw OCR text region with bounding box - **WHEN** checking if the region is covered by PP-StructureV3 - **THEN** the region SHALL be considered covered if IoA (intersection area / OCR box area) exceeds the type-specific threshold - **AND** IoA SHALL be used instead of IoU because it correctly measures "small box contained in large box" relationship - **AND** regions not meeting the IoA criterion SHALL be marked as uncovered #### Scenario: Element-type-specific IoA thresholds are applied - **GIVEN** a Raw OCR region being evaluated for coverage - **WHEN** comparing against PP-StructureV3 elements of different types - **THEN** the system SHALL apply different IoA thresholds: - TEXT, TITLE, HEADER, FOOTER: IoA > 0.6 (tolerates boundary errors) - TABLE: IoA > 0.1 (strict filtering to preserve table structure) - FIGURE, IMAGE: IoA > 0.8 (preserves text within figures like axis labels) - **AND** a region is considered covered if it meets the threshold for ANY overlapping element #### Scenario: Only TEXT elements are supplemented - **GIVEN** uncovered Raw OCR regions identified for supplementation - **WHEN** PP-StructureV3 has detected TABLE, IMAGE, FIGURE, FLOWCHART, HEADER, or FOOTER elements - **THEN** the system SHALL NOT supplement regions that overlap with these structural elements - **AND** only supplement regions as TEXT type to preserve structural integrity #### Scenario: Supplemented regions meet confidence threshold - **GIVEN** Raw OCR regions to be supplemented - **WHEN** a region has confidence score below 0.3 - **THEN** the system SHALL skip that region - **AND** only supplement regions with confidence >= 0.3 #### Scenario: Deduplication uses IoA instead of IoU - **GIVEN** a Raw OCR region being considered for supplementation - **WHEN** the region has IoA > 0.5 with any existing PP-StructureV3 TEXT element - **THEN** the system SHALL skip that region to prevent duplicate text - **AND** the original PP-StructureV3 element SHALL be preserved #### Scenario: Reading order is recalculated after gap filling - **GIVEN** supplemented elements have been added to the page - **WHEN** assembling the final element list - **THEN** the system SHALL recalculate reading order for the entire page - **AND** sort elements by y0 coordinate (top to bottom) then x0 (left to right) - **AND** ensure logical document flow is maintained #### Scenario: Coordinate alignment with ocr_dimensions - **GIVEN** Raw OCR processing may involve image resizing - **WHEN** comparing Raw OCR bbox with PP-StructureV3 bbox - **THEN** the system SHALL use ocr_dimensions to normalize coordinates - **AND** ensure both sources reference the same coordinate space - **AND** prevent coverage misdetection due to scale differences #### Scenario: Supplemented elements have complete metadata - **GIVEN** a Raw OCR region being added as supplemented element - **WHEN** creating the DocumentElement - **THEN** the element SHALL include page_number - **AND** include confidence score from Raw OCR - **AND** include original bbox coordinates - **AND** optionally include source indicator for debugging ### Requirement: Gap Filling Configuration The system SHALL provide configurable parameters for gap filling behavior. #### Scenario: Gap filling can be disabled via configuration - **GIVEN** gap_filling_enabled is set to false in configuration - **WHEN** OCR track processing runs - **THEN** the system SHALL skip all gap filling logic - **AND** output only PP-StructureV3 results as before #### Scenario: Coverage threshold is configurable - **GIVEN** gap_filling_coverage_threshold is set to 0.8 - **WHEN** PP-StructureV3 coverage is 75% - **THEN** the system SHALL activate gap filling - **AND** supplement uncovered regions #### Scenario: IoA thresholds are configurable per element type - **GIVEN** custom IoA thresholds configured: - gap_filling_ioa_threshold_text: 0.6 - gap_filling_ioa_threshold_table: 0.1 - gap_filling_ioa_threshold_figure: 0.8 - gap_filling_dedup_ioa_threshold: 0.5 - **WHEN** evaluating coverage and deduplication - **THEN** the system SHALL use the configured values - **AND** apply them consistently throughout gap filling process #### Scenario: Confidence threshold is configurable - **GIVEN** gap_filling_confidence_threshold is set to 0.5 - **WHEN** supplementing Raw OCR regions - **THEN** the system SHALL only include regions with confidence >= 0.5 - **AND** filter out lower confidence regions #### Scenario: Boundary shrinking reduces edge duplicates - **GIVEN** gap_filling_shrink_pixels is set to 1 - **WHEN** evaluating coverage with IoA - **THEN** the system SHALL shrink OCR bounding boxes inward by 1 pixel on each side - **AND** this reduces false "uncovered" detection at region boundaries ## ADDED Requirements ### Requirement: Use PP-StructureV3 Internal OCR Results The system SHALL preferentially use PP-StructureV3's internal OCR results (`overall_ocr_res`) instead of running a separate Raw OCR inference. #### Scenario: Extract overall_ocr_res from PP-StructureV3 - **GIVEN** PP-StructureV3 processing completes - **WHEN** the result contains `json['res']['overall_ocr_res']` - **THEN** the system SHALL extract OCR regions from: - `dt_polys`: detection box polygons - `rec_texts`: recognized text strings - `rec_scores`: confidence scores - **AND** convert these to the standard TextRegion format for gap filling #### Scenario: Skip separate Raw OCR when overall_ocr_res is available - **GIVEN** gap_filling_use_overall_ocr is true (default) - **WHEN** PP-StructureV3 result contains overall_ocr_res - **THEN** the system SHALL NOT execute separate PaddleOCR inference - **AND** use the extracted overall_ocr_res as the OCR source - **AND** this reduces total inference time by approximately 50% #### Scenario: Fallback to separate Raw OCR when needed - **GIVEN** gap_filling_use_overall_ocr is false OR overall_ocr_res is missing - **WHEN** gap filling is activated - **THEN** the system SHALL execute separate PaddleOCR inference as before - **AND** use the separate OCR results for gap filling - **AND** this maintains backward compatibility #### Scenario: Coordinate consistency is guaranteed - **GIVEN** overall_ocr_res is extracted from PP-StructureV3 - **WHEN** comparing with PP-StructureV3 layout elements - **THEN** both SHALL use the same coordinate system - **AND** no additional coordinate alignment is needed - **AND** this prevents scale mismatch issues