OCR/openspec/changes/archive/2025-11-27-add-ocr-track-gap-filling/tasks.md

# Tasks: Add OCR Track Gap Filling

## 1. Core Implementation

- [x] 1.1 Create `gap_filling_service.py` with `GapFillingService` class
- [x] 1.2 Implement bbox coverage calculation (center-point and IoU methods)
- [x] 1.3 Implement gap detection logic (find uncovered raw OCR regions)
- [x] 1.4 Implement confidence threshold filtering for supplemented regions
- [x] 1.5 Implement element type filtering (only supplement TEXT, skip TABLE/IMAGE/FIGURE/etc.)
- [x] 1.6 Implement reading order recalculation (sort by y0, x0)
- [x] 1.7 Implement deduplication logic (skip high IoU overlaps with PP-Structure TEXT)
- [x] 1.8 Implement optional text merging for fragmented adjacent regions

## 2. Integration

- [x] 2.1 Modify `OCRToUnifiedConverter` to accept raw OCR text_regions
- [x] 2.2 Add gap filling activation condition check (coverage < 70% or element count disparity)
- [x] 2.3 Ensure coordinate alignment between raw OCR and PP-Structure (ocr_dimensions handling)
- [x] 2.4 Add page metadata (page_number, confidence, bbox) to supplemented elements
- [x] 2.5 Ensure track isolation (only OCR track, not Direct/Hybrid)

## 3. Configuration

- [x] 3.1 Add configurable parameters to settings:
  - `gap_filling_enabled`: bool (default: True)
  - `gap_filling_coverage_threshold`: float (default: 0.7)
  - `gap_filling_iou_threshold`: float (default: 0.15)
  - `gap_filling_confidence_threshold`: float (default: 0.3)
  - `gap_filling_dedup_iou_threshold`: float (default: 0.5)

## 4. Testing(with env)

- [x] 4.1 Create test fixtures with PP-Structure severe miss-detection case(with scan.pdf / scan2.pdf)
- [x] 4.2 Test gap detection correctly identifies uncovered regions
- [x] 4.3 Test supplemented elements have correct metadata
- [x] 4.4 Test reading order is correctly recalculated
- [x] 4.5 Test deduplication prevents duplicate text
- [x] 4.6 Test normal document without miss-detection has no duplicate/inflation
- [x] 4.7 Test track isolation (Direct track unaffected)

## 5. Documentation

- [x] 5.1 Add inline documentation to GapFillingService
- [x] 5.2 Update configuration documentation with new settings