Files
OCR/openspec/changes/archive/2025-11-27-add-ocr-track-gap-filling/proposal.md
egg 59206a6ab8 feat: simplify layout model selection and archive proposals
Changes:
- Replace PP-Structure 7-slider parameter UI with simple 3-option layout model selector
- Add layout model mapping: chinese (PP-DocLayout-S), default (PubLayNet), cdla
- Add LayoutModelSelector component and zh-TW translations
- Fix "default" model behavior with sentinel value for PubLayNet
- Add gap filling service for OCR track coverage improvement
- Add PP-Structure debug utilities
- Archive completed/incomplete proposals:
  - add-ocr-track-gap-filling (complete)
  - fix-ocr-track-table-rendering (incomplete)
  - simplify-ppstructure-model-selection (22/25 tasks)
- Add new layout model tests, archive old PP-Structure param tests
- Update OpenSpec ocr-processing spec with layout model requirements

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-27 13:27:00 +08:00

31 lines
1.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Change: Add OCR Track Gap Filling with Raw OCR Text Regions
## Why
PP-StructureV3 的版面分析模型在處理某些掃描文件時會嚴重漏檢,導致大量文字內容遺失。實測 scan.pdf 顯示:
- Raw PaddleOCR 文字識別:偵測到 **56 個文字區域**
- PP-StructureV3 版面分析:僅輸出 **9 個元素**
- 遺失比例:約 **84%** 的內容未被 PP-StructureV3 識別
問題根源在於 PP-StructureV3 內部的 Layout Detection Model 對掃描文件類型支援不足而非我們的程式碼問題。Raw OCR 能正確偵測所有文字區域,但這些資訊在 PP-StructureV3 的結構化處理過程中被遺失。
## What Changes
實作「混合式處理」(Hybrid Approach):使用 Raw OCR 的文字區域來補充 PP-StructureV3 遺失的內容。
- **新增** `GapFillingService` 類別,負責偵測並補回 PP-StructureV3 遺漏的文字區域
- **新增** 覆蓋率計算邏輯(中心點落入或 IoU 閾值判斷)
- **新增** 自動啟用條件:當 PP-Structure 覆蓋率 < 70% 或元素數顯著低於 Raw OCR 框數
- **修改** `OCRToUnifiedConverter` 整合 gap filling 邏輯
- **新增** 重新計算 reading_order 邏輯 y0, x0 排序
- **新增** 測試案例PP-Structure 嚴重漏檢案例無漏檢正常文件驗證
## Impact
- **Affected specs**: `ocr-processing`
- **Affected code**:
- `backend/app/services/ocr_to_unified_converter.py` - 整合 gap filling
- `backend/app/services/gap_filling_service.py` - 新增 (核心邏輯)
- `backend/tests/test_gap_filling.py` - 新增 (測試)
- **Track isolation**: 僅作用於 OCR trackDirect/Hybrid track 不受影響