Changes: - Replace PP-Structure 7-slider parameter UI with simple 3-option layout model selector - Add layout model mapping: chinese (PP-DocLayout-S), default (PubLayNet), cdla - Add LayoutModelSelector component and zh-TW translations - Fix "default" model behavior with sentinel value for PubLayNet - Add gap filling service for OCR track coverage improvement - Add PP-Structure debug utilities - Archive completed/incomplete proposals: - add-ocr-track-gap-filling (complete) - fix-ocr-track-table-rendering (incomplete) - simplify-ppstructure-model-selection (22/25 tasks) - Add new layout model tests, archive old PP-Structure param tests - Update OpenSpec ocr-processing spec with layout model requirements 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
45 lines
2.1 KiB
Markdown
45 lines
2.1 KiB
Markdown
# Tasks: Add OCR Track Gap Filling
|
|
|
|
## 1. Core Implementation
|
|
|
|
- [x] 1.1 Create `gap_filling_service.py` with `GapFillingService` class
|
|
- [x] 1.2 Implement bbox coverage calculation (center-point and IoU methods)
|
|
- [x] 1.3 Implement gap detection logic (find uncovered raw OCR regions)
|
|
- [x] 1.4 Implement confidence threshold filtering for supplemented regions
|
|
- [x] 1.5 Implement element type filtering (only supplement TEXT, skip TABLE/IMAGE/FIGURE/etc.)
|
|
- [x] 1.6 Implement reading order recalculation (sort by y0, x0)
|
|
- [x] 1.7 Implement deduplication logic (skip high IoU overlaps with PP-Structure TEXT)
|
|
- [x] 1.8 Implement optional text merging for fragmented adjacent regions
|
|
|
|
## 2. Integration
|
|
|
|
- [x] 2.1 Modify `OCRToUnifiedConverter` to accept raw OCR text_regions
|
|
- [x] 2.2 Add gap filling activation condition check (coverage < 70% or element count disparity)
|
|
- [x] 2.3 Ensure coordinate alignment between raw OCR and PP-Structure (ocr_dimensions handling)
|
|
- [x] 2.4 Add page metadata (page_number, confidence, bbox) to supplemented elements
|
|
- [x] 2.5 Ensure track isolation (only OCR track, not Direct/Hybrid)
|
|
|
|
## 3. Configuration
|
|
|
|
- [x] 3.1 Add configurable parameters to settings:
|
|
- `gap_filling_enabled`: bool (default: True)
|
|
- `gap_filling_coverage_threshold`: float (default: 0.7)
|
|
- `gap_filling_iou_threshold`: float (default: 0.15)
|
|
- `gap_filling_confidence_threshold`: float (default: 0.3)
|
|
- `gap_filling_dedup_iou_threshold`: float (default: 0.5)
|
|
|
|
## 4. Testing(with env)
|
|
|
|
- [x] 4.1 Create test fixtures with PP-Structure severe miss-detection case(with scan.pdf / scan2.pdf)
|
|
- [x] 4.2 Test gap detection correctly identifies uncovered regions
|
|
- [x] 4.3 Test supplemented elements have correct metadata
|
|
- [x] 4.4 Test reading order is correctly recalculated
|
|
- [x] 4.5 Test deduplication prevents duplicate text
|
|
- [x] 4.6 Test normal document without miss-detection has no duplicate/inflation
|
|
- [x] 4.7 Test track isolation (Direct track unaffected)
|
|
|
|
## 5. Documentation
|
|
|
|
- [x] 5.1 Add inline documentation to GapFillingService
|
|
- [x] 5.2 Update configuration documentation with new settings
|