chore: backup before code cleanup
Backup commit before executing remove-unused-code proposal. This includes all pending changes and new features. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
49
openspec/changes/improve-ocr-track-algorithm/proposal.md
Normal file
49
openspec/changes/improve-ocr-track-algorithm/proposal.md
Normal file
@@ -0,0 +1,49 @@
|
||||
# Change: Improve OCR Track Algorithm Based on PP-StructureV3 Best Practices
|
||||
|
||||
## Why
|
||||
|
||||
目前 OCR Track 的 Gap Filling 演算法使用 **IoU (Intersection over Union)** 判斷 OCR 文字是否被 Layout 區域覆蓋。根據 PaddleX 官方文件 (paddle_review.md) 建議,應改用 **IoA (Intersection over Area)** 才能正確判斷「小框是否被大框包含」的非對稱關係。此外,現行使用統一閾值處理所有元素類型,但不同類型應有不同閾值策略。
|
||||
|
||||
## What Changes
|
||||
|
||||
1. **IoU → IoA 演算法變更**: 將 `gap_filling_service.py` 中的覆蓋判定從 IoU 改為 IoA
|
||||
2. **動態閾值策略**: 依元素類型 (TEXT, TABLE, FIGURE) 使用不同的 IoA 閾值
|
||||
3. **使用 PP-StructureV3 內建 OCR**: 改用 `overall_ocr_res` 取代獨立執行 Raw OCR,節省推理時間並確保座標一致
|
||||
4. **邊界收縮處理**: OCR 框內縮 1-2 px 避免邊緣重複渲染
|
||||
|
||||
## Impact
|
||||
|
||||
- Affected specs: `ocr-processing`
|
||||
- Affected code:
|
||||
- `backend/app/services/gap_filling_service.py` - 核心演算法變更
|
||||
- `backend/app/services/ocr_service.py` - 改用 `overall_ocr_res`
|
||||
- `backend/app/services/processing_orchestrator.py` - 調整 OCR 資料來源
|
||||
- `backend/app/core/config.py` - 新增元素類型閾值設定
|
||||
|
||||
## Technical Details
|
||||
|
||||
### 1. IoA vs IoU
|
||||
|
||||
```
|
||||
IoU = 交集面積 / 聯集面積 (對稱,用於判斷兩框是否指向同物體)
|
||||
IoA = 交集面積 / OCR框面積 (非對稱,用於判斷小框是否被大框包含)
|
||||
```
|
||||
|
||||
當 Layout 框遠大於 OCR 框時,IoU 會過小導致誤判為「未覆蓋」。
|
||||
|
||||
### 2. 動態閾值建議
|
||||
|
||||
| 元素類型 | IoA 閾值 | 說明 |
|
||||
|---------|---------|------|
|
||||
| TEXT/TITLE | 0.6 | 容忍邊界誤差 |
|
||||
| TABLE | 0.1 | 嚴格過濾,避免破壞表格結構 |
|
||||
| FIGURE | 0.8 | 保留圖中文字 (如軸標籤) |
|
||||
|
||||
### 3. overall_ocr_res 驗證結果
|
||||
|
||||
已確認 PP-StructureV3 的 `json['res']['overall_ocr_res']` 包含:
|
||||
- `dt_polys`: 檢測框座標 (polygon 格式)
|
||||
- `rec_texts`: 識別文字
|
||||
- `rec_scores`: 識別信心度
|
||||
|
||||
測試結果顯示與獨立執行 Raw OCR 的結果數量相同 (59 regions),可安全替換。
|
||||
@@ -0,0 +1,142 @@
|
||||
## MODIFIED Requirements
|
||||
|
||||
### Requirement: OCR Track Gap Filling with Raw OCR Regions
|
||||
|
||||
The system SHALL detect and fill gaps in PP-StructureV3 output by supplementing with Raw OCR text regions when significant content loss is detected.
|
||||
|
||||
#### Scenario: Gap filling activates when coverage is low
|
||||
- **GIVEN** an OCR track processing task
|
||||
- **WHEN** PP-StructureV3 outputs elements that cover less than 70% of Raw OCR text regions
|
||||
- **THEN** the system SHALL activate gap filling
|
||||
- **AND** identify Raw OCR regions not covered by any PP-StructureV3 element
|
||||
- **AND** supplement these regions as TEXT elements in the output
|
||||
|
||||
#### Scenario: Coverage is determined by IoA (Intersection over Area)
|
||||
- **GIVEN** a Raw OCR text region with bounding box
|
||||
- **WHEN** checking if the region is covered by PP-StructureV3
|
||||
- **THEN** the region SHALL be considered covered if IoA (intersection area / OCR box area) exceeds the type-specific threshold
|
||||
- **AND** IoA SHALL be used instead of IoU because it correctly measures "small box contained in large box" relationship
|
||||
- **AND** regions not meeting the IoA criterion SHALL be marked as uncovered
|
||||
|
||||
#### Scenario: Element-type-specific IoA thresholds are applied
|
||||
- **GIVEN** a Raw OCR region being evaluated for coverage
|
||||
- **WHEN** comparing against PP-StructureV3 elements of different types
|
||||
- **THEN** the system SHALL apply different IoA thresholds:
|
||||
- TEXT, TITLE, HEADER, FOOTER: IoA > 0.6 (tolerates boundary errors)
|
||||
- TABLE: IoA > 0.1 (strict filtering to preserve table structure)
|
||||
- FIGURE, IMAGE: IoA > 0.8 (preserves text within figures like axis labels)
|
||||
- **AND** a region is considered covered if it meets the threshold for ANY overlapping element
|
||||
|
||||
#### Scenario: Only TEXT elements are supplemented
|
||||
- **GIVEN** uncovered Raw OCR regions identified for supplementation
|
||||
- **WHEN** PP-StructureV3 has detected TABLE, IMAGE, FIGURE, FLOWCHART, HEADER, or FOOTER elements
|
||||
- **THEN** the system SHALL NOT supplement regions that overlap with these structural elements
|
||||
- **AND** only supplement regions as TEXT type to preserve structural integrity
|
||||
|
||||
#### Scenario: Supplemented regions meet confidence threshold
|
||||
- **GIVEN** Raw OCR regions to be supplemented
|
||||
- **WHEN** a region has confidence score below 0.3
|
||||
- **THEN** the system SHALL skip that region
|
||||
- **AND** only supplement regions with confidence >= 0.3
|
||||
|
||||
#### Scenario: Deduplication uses IoA instead of IoU
|
||||
- **GIVEN** a Raw OCR region being considered for supplementation
|
||||
- **WHEN** the region has IoA > 0.5 with any existing PP-StructureV3 TEXT element
|
||||
- **THEN** the system SHALL skip that region to prevent duplicate text
|
||||
- **AND** the original PP-StructureV3 element SHALL be preserved
|
||||
|
||||
#### Scenario: Reading order is recalculated after gap filling
|
||||
- **GIVEN** supplemented elements have been added to the page
|
||||
- **WHEN** assembling the final element list
|
||||
- **THEN** the system SHALL recalculate reading order for the entire page
|
||||
- **AND** sort elements by y0 coordinate (top to bottom) then x0 (left to right)
|
||||
- **AND** ensure logical document flow is maintained
|
||||
|
||||
#### Scenario: Coordinate alignment with ocr_dimensions
|
||||
- **GIVEN** Raw OCR processing may involve image resizing
|
||||
- **WHEN** comparing Raw OCR bbox with PP-StructureV3 bbox
|
||||
- **THEN** the system SHALL use ocr_dimensions to normalize coordinates
|
||||
- **AND** ensure both sources reference the same coordinate space
|
||||
- **AND** prevent coverage misdetection due to scale differences
|
||||
|
||||
#### Scenario: Supplemented elements have complete metadata
|
||||
- **GIVEN** a Raw OCR region being added as supplemented element
|
||||
- **WHEN** creating the DocumentElement
|
||||
- **THEN** the element SHALL include page_number
|
||||
- **AND** include confidence score from Raw OCR
|
||||
- **AND** include original bbox coordinates
|
||||
- **AND** optionally include source indicator for debugging
|
||||
|
||||
### Requirement: Gap Filling Configuration
|
||||
|
||||
The system SHALL provide configurable parameters for gap filling behavior.
|
||||
|
||||
#### Scenario: Gap filling can be disabled via configuration
|
||||
- **GIVEN** gap_filling_enabled is set to false in configuration
|
||||
- **WHEN** OCR track processing runs
|
||||
- **THEN** the system SHALL skip all gap filling logic
|
||||
- **AND** output only PP-StructureV3 results as before
|
||||
|
||||
#### Scenario: Coverage threshold is configurable
|
||||
- **GIVEN** gap_filling_coverage_threshold is set to 0.8
|
||||
- **WHEN** PP-StructureV3 coverage is 75%
|
||||
- **THEN** the system SHALL activate gap filling
|
||||
- **AND** supplement uncovered regions
|
||||
|
||||
#### Scenario: IoA thresholds are configurable per element type
|
||||
- **GIVEN** custom IoA thresholds configured:
|
||||
- gap_filling_ioa_threshold_text: 0.6
|
||||
- gap_filling_ioa_threshold_table: 0.1
|
||||
- gap_filling_ioa_threshold_figure: 0.8
|
||||
- gap_filling_dedup_ioa_threshold: 0.5
|
||||
- **WHEN** evaluating coverage and deduplication
|
||||
- **THEN** the system SHALL use the configured values
|
||||
- **AND** apply them consistently throughout gap filling process
|
||||
|
||||
#### Scenario: Confidence threshold is configurable
|
||||
- **GIVEN** gap_filling_confidence_threshold is set to 0.5
|
||||
- **WHEN** supplementing Raw OCR regions
|
||||
- **THEN** the system SHALL only include regions with confidence >= 0.5
|
||||
- **AND** filter out lower confidence regions
|
||||
|
||||
#### Scenario: Boundary shrinking reduces edge duplicates
|
||||
- **GIVEN** gap_filling_shrink_pixels is set to 1
|
||||
- **WHEN** evaluating coverage with IoA
|
||||
- **THEN** the system SHALL shrink OCR bounding boxes inward by 1 pixel on each side
|
||||
- **AND** this reduces false "uncovered" detection at region boundaries
|
||||
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Use PP-StructureV3 Internal OCR Results
|
||||
|
||||
The system SHALL preferentially use PP-StructureV3's internal OCR results (`overall_ocr_res`) instead of running a separate Raw OCR inference.
|
||||
|
||||
#### Scenario: Extract overall_ocr_res from PP-StructureV3
|
||||
- **GIVEN** PP-StructureV3 processing completes
|
||||
- **WHEN** the result contains `json['res']['overall_ocr_res']`
|
||||
- **THEN** the system SHALL extract OCR regions from:
|
||||
- `dt_polys`: detection box polygons
|
||||
- `rec_texts`: recognized text strings
|
||||
- `rec_scores`: confidence scores
|
||||
- **AND** convert these to the standard TextRegion format for gap filling
|
||||
|
||||
#### Scenario: Skip separate Raw OCR when overall_ocr_res is available
|
||||
- **GIVEN** gap_filling_use_overall_ocr is true (default)
|
||||
- **WHEN** PP-StructureV3 result contains overall_ocr_res
|
||||
- **THEN** the system SHALL NOT execute separate PaddleOCR inference
|
||||
- **AND** use the extracted overall_ocr_res as the OCR source
|
||||
- **AND** this reduces total inference time by approximately 50%
|
||||
|
||||
#### Scenario: Fallback to separate Raw OCR when needed
|
||||
- **GIVEN** gap_filling_use_overall_ocr is false OR overall_ocr_res is missing
|
||||
- **WHEN** gap filling is activated
|
||||
- **THEN** the system SHALL execute separate PaddleOCR inference as before
|
||||
- **AND** use the separate OCR results for gap filling
|
||||
- **AND** this maintains backward compatibility
|
||||
|
||||
#### Scenario: Coordinate consistency is guaranteed
|
||||
- **GIVEN** overall_ocr_res is extracted from PP-StructureV3
|
||||
- **WHEN** comparing with PP-StructureV3 layout elements
|
||||
- **THEN** both SHALL use the same coordinate system
|
||||
- **AND** no additional coordinate alignment is needed
|
||||
- **AND** this prevents scale mismatch issues
|
||||
54
openspec/changes/improve-ocr-track-algorithm/tasks.md
Normal file
54
openspec/changes/improve-ocr-track-algorithm/tasks.md
Normal file
@@ -0,0 +1,54 @@
|
||||
## 1. Algorithm Changes (gap_filling_service.py)
|
||||
|
||||
### 1.1 IoA Implementation
|
||||
- [x] 1.1.1 Add `_calculate_ioa()` method alongside existing `_calculate_iou()`
|
||||
- [x] 1.1.2 Modify `_is_region_covered()` to use IoA instead of IoU
|
||||
- [x] 1.1.3 Update deduplication logic to use IoA
|
||||
|
||||
### 1.2 Dynamic Threshold Strategy
|
||||
- [x] 1.2.1 Add element-type-specific thresholds as class constants
|
||||
- [x] 1.2.2 Modify `_is_region_covered()` to accept element type parameter
|
||||
- [x] 1.2.3 Apply different thresholds based on element type (TEXT: 0.6, TABLE: 0.1, FIGURE: 0.8)
|
||||
|
||||
### 1.3 Boundary Shrinking
|
||||
- [x] 1.3.1 Add optional `shrink_pixels` parameter to coverage detection
|
||||
- [x] 1.3.2 Implement bbox shrinking logic (inward 1-2 px)
|
||||
|
||||
## 2. OCR Data Source Changes
|
||||
|
||||
### 2.1 Extract overall_ocr_res from PP-StructureV3
|
||||
- [x] 2.1.1 Modify `pp_structure_enhanced.py` to extract `overall_ocr_res` from result
|
||||
- [x] 2.1.2 Convert `dt_polys` + `rec_texts` + `rec_scores` to TextRegion format
|
||||
- [x] 2.1.3 Store extracted OCR in result dict for gap filling
|
||||
|
||||
### 2.2 Update Processing Orchestrator
|
||||
- [x] 2.2.1 Add option to use `overall_ocr_res` as OCR source
|
||||
- [x] 2.2.2 Skip separate Raw OCR inference when using PP-StructureV3's OCR
|
||||
- [x] 2.2.3 Maintain backward compatibility with explicit Raw OCR mode
|
||||
|
||||
## 3. Configuration Updates
|
||||
|
||||
### 3.1 Add Settings (config.py)
|
||||
- [x] 3.1.1 Add `gap_filling_ioa_threshold_text: float = 0.6`
|
||||
- [x] 3.1.2 Add `gap_filling_ioa_threshold_table: float = 0.1`
|
||||
- [x] 3.1.3 Add `gap_filling_ioa_threshold_figure: float = 0.8`
|
||||
- [x] 3.1.4 Add `gap_filling_use_overall_ocr: bool = True`
|
||||
- [x] 3.1.5 Add `gap_filling_shrink_pixels: int = 1`
|
||||
|
||||
## 4. Testing
|
||||
|
||||
### 4.1 Unit Tests
|
||||
- [ ] 4.1.1 Test IoA calculation with known values
|
||||
- [ ] 4.1.2 Test dynamic threshold selection by element type
|
||||
- [ ] 4.1.3 Test boundary shrinking edge cases
|
||||
|
||||
### 4.2 Integration Tests
|
||||
- [ ] 4.2.1 Test with scan.pdf (current problematic file)
|
||||
- [ ] 4.2.2 Compare results: old IoU vs new IoA approach
|
||||
- [ ] 4.2.3 Verify no duplicate text rendering in output PDF
|
||||
- [ ] 4.2.4 Verify table content is not duplicated outside table bounds
|
||||
|
||||
## 5. Documentation
|
||||
|
||||
- [x] 5.1 Update spec documentation with new algorithm
|
||||
- [x] 5.2 Add inline code comments explaining IoA vs IoU
|
||||
Reference in New Issue
Block a user