Files
OCR/openspec/changes/archive/2025-11-27-add-ocr-track-gap-filling/tasks.md
egg 59206a6ab8 feat: simplify layout model selection and archive proposals
Changes:
- Replace PP-Structure 7-slider parameter UI with simple 3-option layout model selector
- Add layout model mapping: chinese (PP-DocLayout-S), default (PubLayNet), cdla
- Add LayoutModelSelector component and zh-TW translations
- Fix "default" model behavior with sentinel value for PubLayNet
- Add gap filling service for OCR track coverage improvement
- Add PP-Structure debug utilities
- Archive completed/incomplete proposals:
  - add-ocr-track-gap-filling (complete)
  - fix-ocr-track-table-rendering (incomplete)
  - simplify-ppstructure-model-selection (22/25 tasks)
- Add new layout model tests, archive old PP-Structure param tests
- Update OpenSpec ocr-processing spec with layout model requirements

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-27 13:27:00 +08:00

45 lines
2.1 KiB
Markdown

# Tasks: Add OCR Track Gap Filling
## 1. Core Implementation
- [x] 1.1 Create `gap_filling_service.py` with `GapFillingService` class
- [x] 1.2 Implement bbox coverage calculation (center-point and IoU methods)
- [x] 1.3 Implement gap detection logic (find uncovered raw OCR regions)
- [x] 1.4 Implement confidence threshold filtering for supplemented regions
- [x] 1.5 Implement element type filtering (only supplement TEXT, skip TABLE/IMAGE/FIGURE/etc.)
- [x] 1.6 Implement reading order recalculation (sort by y0, x0)
- [x] 1.7 Implement deduplication logic (skip high IoU overlaps with PP-Structure TEXT)
- [x] 1.8 Implement optional text merging for fragmented adjacent regions
## 2. Integration
- [x] 2.1 Modify `OCRToUnifiedConverter` to accept raw OCR text_regions
- [x] 2.2 Add gap filling activation condition check (coverage < 70% or element count disparity)
- [x] 2.3 Ensure coordinate alignment between raw OCR and PP-Structure (ocr_dimensions handling)
- [x] 2.4 Add page metadata (page_number, confidence, bbox) to supplemented elements
- [x] 2.5 Ensure track isolation (only OCR track, not Direct/Hybrid)
## 3. Configuration
- [x] 3.1 Add configurable parameters to settings:
- `gap_filling_enabled`: bool (default: True)
- `gap_filling_coverage_threshold`: float (default: 0.7)
- `gap_filling_iou_threshold`: float (default: 0.15)
- `gap_filling_confidence_threshold`: float (default: 0.3)
- `gap_filling_dedup_iou_threshold`: float (default: 0.5)
## 4. Testing(with env)
- [x] 4.1 Create test fixtures with PP-Structure severe miss-detection case(with scan.pdf / scan2.pdf)
- [x] 4.2 Test gap detection correctly identifies uncovered regions
- [x] 4.3 Test supplemented elements have correct metadata
- [x] 4.4 Test reading order is correctly recalculated
- [x] 4.5 Test deduplication prevents duplicate text
- [x] 4.6 Test normal document without miss-detection has no duplicate/inflation
- [x] 4.7 Test track isolation (Direct track unaffected)
## 5. Documentation
- [x] 5.1 Add inline documentation to GapFillingService
- [x] 5.2 Update configuration documentation with new settings