# Tasks: Add OCR Track Gap Filling ## 1. Core Implementation - [x] 1.1 Create `gap_filling_service.py` with `GapFillingService` class - [x] 1.2 Implement bbox coverage calculation (center-point and IoU methods) - [x] 1.3 Implement gap detection logic (find uncovered raw OCR regions) - [x] 1.4 Implement confidence threshold filtering for supplemented regions - [x] 1.5 Implement element type filtering (only supplement TEXT, skip TABLE/IMAGE/FIGURE/etc.) - [x] 1.6 Implement reading order recalculation (sort by y0, x0) - [x] 1.7 Implement deduplication logic (skip high IoU overlaps with PP-Structure TEXT) - [x] 1.8 Implement optional text merging for fragmented adjacent regions ## 2. Integration - [x] 2.1 Modify `OCRToUnifiedConverter` to accept raw OCR text_regions - [x] 2.2 Add gap filling activation condition check (coverage < 70% or element count disparity) - [x] 2.3 Ensure coordinate alignment between raw OCR and PP-Structure (ocr_dimensions handling) - [x] 2.4 Add page metadata (page_number, confidence, bbox) to supplemented elements - [x] 2.5 Ensure track isolation (only OCR track, not Direct/Hybrid) ## 3. Configuration - [x] 3.1 Add configurable parameters to settings: - `gap_filling_enabled`: bool (default: True) - `gap_filling_coverage_threshold`: float (default: 0.7) - `gap_filling_iou_threshold`: float (default: 0.15) - `gap_filling_confidence_threshold`: float (default: 0.3) - `gap_filling_dedup_iou_threshold`: float (default: 0.5) ## 4. Testing(with env) - [x] 4.1 Create test fixtures with PP-Structure severe miss-detection case(with scan.pdf / scan2.pdf) - [x] 4.2 Test gap detection correctly identifies uncovered regions - [x] 4.3 Test supplemented elements have correct metadata - [x] 4.4 Test reading order is correctly recalculated - [x] 4.5 Test deduplication prevents duplicate text - [x] 4.6 Test normal document without miss-detection has no duplicate/inflation - [x] 4.7 Test track isolation (Direct track unaffected) ## 5. Documentation - [x] 5.1 Add inline documentation to GapFillingService - [x] 5.2 Update configuration documentation with new settings