feat: simplify layout model selection and archive proposals
Changes: - Replace PP-Structure 7-slider parameter UI with simple 3-option layout model selector - Add layout model mapping: chinese (PP-DocLayout-S), default (PubLayNet), cdla - Add LayoutModelSelector component and zh-TW translations - Fix "default" model behavior with sentinel value for PubLayNet - Add gap filling service for OCR track coverage improvement - Add PP-Structure debug utilities - Archive completed/incomplete proposals: - add-ocr-track-gap-filling (complete) - fix-ocr-track-table-rendering (incomplete) - simplify-ppstructure-model-selection (22/25 tasks) - Add new layout model tests, archive old PP-Structure param tests - Update OpenSpec ocr-processing spec with layout model requirements 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,28 @@
|
||||
# Change: Fix OCR Track Table Empty Columns and Alignment
|
||||
|
||||
## Why
|
||||
|
||||
PP-Structure 生成的表格經常包含空白欄位(所有 row 該欄皆為空/空白),導致轉換後的 UnifiedDocument 表格出現空欄與欄位錯位。目前 OCR Track 直接使用原始資料,未進行清理,影響 PDF/JSON/Markdown 輸出品質。
|
||||
|
||||
## What Changes
|
||||
|
||||
- 新增 `trim_empty_columns()` 函數,清理 OCR Track 表格的空欄
|
||||
- 在 `_convert_table_data` 入口調用清洗邏輯,確保 TableData 乾淨
|
||||
- 處理 col_span 重算:若 span 跨過被移除欄位,縮小 span
|
||||
- 更新 columns/cols 數值、調整各 cell 的 col 索引
|
||||
- 可選:依 bbox x0 進行欄對齊排序
|
||||
|
||||
## Impact
|
||||
|
||||
- Affected specs: `ocr-processing`
|
||||
- Affected code:
|
||||
- `backend/app/services/ocr_to_unified_converter.py` (主要修改)
|
||||
- 不影響 Direct/HYBRID 路徑
|
||||
- PDF/JSON/Markdown 輸出將更乾淨
|
||||
|
||||
## Constraints
|
||||
|
||||
- 保持表格 bbox、頁面座標不變
|
||||
- 不修改 Direct/HYBRID 路徑
|
||||
- 只移除「所有行皆空」的欄;若表頭空但數據有值,不應移除
|
||||
- 保留原 bbox,避免 PDF 版面漂移
|
||||
@@ -0,0 +1,61 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: OCR Table Empty Column Cleanup
|
||||
|
||||
The OCR Track converter SHALL clean up PP-Structure generated tables by removing columns where all rows have empty or whitespace-only content.
|
||||
|
||||
The system SHALL:
|
||||
1. Identify columns where every cell's content is empty or contains only whitespace (using `.strip()` to determine emptiness)
|
||||
2. Remove identified empty columns from the table structure
|
||||
3. Update the `columns`/`cols` value to reflect the new column count
|
||||
4. Recalculate each cell's `col` index to maintain continuity
|
||||
5. Adjust `col_span` values when spans cross removed columns (shrink span size)
|
||||
6. Remove cells entirely when their complete span falls within removed columns
|
||||
7. Preserve original bbox and page coordinates (no layout drift)
|
||||
8. If `columns` is 0 or missing after cleanup, fill with the calculated column count
|
||||
|
||||
The cleanup SHALL NOT:
|
||||
- Remove columns where the header is empty but data rows contain values
|
||||
- Modify tables in Direct or HYBRID track
|
||||
- Alter the original bbox coordinates
|
||||
|
||||
#### Scenario: All rows in column are empty
|
||||
- **WHEN** a table has a column where all cells contain only empty or whitespace content
|
||||
- **THEN** that column is removed
|
||||
- **AND** remaining cells have their `col` indices decremented appropriately
|
||||
- **AND** `cols` count is reduced by 1
|
||||
|
||||
#### Scenario: Column has empty header but data has values
|
||||
- **WHEN** a table has a column where the header cell is empty
|
||||
- **AND** at least one data row cell in that column contains non-whitespace content
|
||||
- **THEN** that column is NOT removed
|
||||
|
||||
#### Scenario: Cell span crosses removed column
|
||||
- **WHEN** a cell has `col_span > 1`
|
||||
- **AND** one or more columns within the span are removed
|
||||
- **THEN** the `col_span` is reduced by the number of removed columns within the span
|
||||
|
||||
#### Scenario: Cell span entirely within removed columns
|
||||
- **WHEN** a cell's entire span falls within columns that are all removed
|
||||
- **THEN** that cell is removed from the table
|
||||
|
||||
#### Scenario: Missing columns metadata
|
||||
- **WHEN** the table dict has `columns` set to 0 or missing
|
||||
- **AFTER** cleanup is performed
|
||||
- **THEN** `columns` is set to the calculated number of remaining columns
|
||||
|
||||
### Requirement: OCR Table Column Alignment by Bbox
|
||||
|
||||
(Optional Enhancement) When bbox coordinates are available for table cells, the OCR Track converter SHALL use cell bbox x0 coordinates to improve column alignment accuracy.
|
||||
|
||||
The system SHALL:
|
||||
1. Sort cells by bbox `x0` coordinate before assigning column indices
|
||||
2. Reassign `col` indices based on spatial position rather than HTML order
|
||||
|
||||
This requirement is optional and implementation MAY be deferred if bbox data is not reliably available.
|
||||
|
||||
#### Scenario: Cells reordered by bbox position
|
||||
- **WHEN** bbox coordinates are available for table cells
|
||||
- **AND** the original HTML order does not match spatial order
|
||||
- **THEN** cells are reordered by `x0` coordinate
|
||||
- **AND** `col` indices are reassigned to reflect spatial positioning
|
||||
@@ -0,0 +1,43 @@
|
||||
# Tasks: Fix OCR Track Table Empty Columns
|
||||
|
||||
## 1. Core Implementation
|
||||
|
||||
- [x] 1.1 在 `ocr_to_unified_converter.py` 實作 `trim_empty_columns(table_dict: Dict[str, Any]) -> Dict[str, Any]`
|
||||
- 依據 cells 陣列計算每一欄是否「所有 row 的內容皆為空/空白」
|
||||
- 使用 `.strip()` 判斷空白字元
|
||||
- [x] 1.2 實作欄位移除邏輯
|
||||
- 更新 columns/cols 數值
|
||||
- 調整各 cell 的 col 索引
|
||||
- [x] 1.3 實作 col_span 重算邏輯
|
||||
- 若 span 跨過被移除欄位,縮小 span
|
||||
- 若整個 span 落在被刪欄位上,移除該 cell
|
||||
- [x] 1.4 在 `_convert_table_data` 入口呼叫 `trim_empty_columns`
|
||||
- 在建 TableData 之前執行清洗
|
||||
- 同時也在 `_extract_table_data` (HTML 表格解析) 中加入清洗
|
||||
- [ ] 1.5 (可選) 依 bbox x0/x1 進行欄對齊排序
|
||||
- 若可取得 bbox 網格,先依 x0 排序再重排 col index
|
||||
- 此功能延後實作,待 bbox 資料確認可用性後進行
|
||||
|
||||
## 2. Testing & Validation
|
||||
|
||||
- [x] 2.1 單元測試通過
|
||||
- 測試基本空欄移除
|
||||
- 測試表頭空但數據有值(不移除)
|
||||
- 測試 col_span 跨越被移除欄位(縮小 span)
|
||||
- 測試 cell 完全落在被移除欄位(移除 cell)
|
||||
- 測試無空欄情況(不變更)
|
||||
- [x] 2.2 檢查現有 OCR 結果
|
||||
- 現有結果中無「整欄為空」的表格
|
||||
- 實作已就緒,遇到空欄時會正確清理
|
||||
- [x] 2.3 確認 Direct/HYBRID 表格不變
|
||||
- `OCRToUnifiedConverter` 僅在 `ocr_service.py` 中使用
|
||||
- Direct 軌使用 `DirectExtractionEngine`,不受影響
|
||||
|
||||
## 3. Edge Cases & Validation
|
||||
|
||||
- [x] 3.1 處理 columns 欄位為 0/缺失的情況
|
||||
- 以計算後的欄數回填,避免 downstream 依賴出錯
|
||||
- [x] 3.2 處理表頭為空但數據有值的情況
|
||||
- 只移除「所有行皆空」的欄
|
||||
- [x] 3.3 確保不直接修改 `backend/storage/results/...`
|
||||
- 修改 converter,需重新跑任務驗證
|
||||
@@ -0,0 +1,183 @@
|
||||
# Design: OCR Track Gap Filling
|
||||
|
||||
## Context
|
||||
|
||||
PP-StructureV3 版面分析模型在處理某些掃描文件時會嚴重漏檢。實測顯示 Raw PaddleOCR 能偵測 56 個文字區域,但 PP-StructureV3 僅輸出 9 個元素(遺失 84%)。
|
||||
|
||||
問題發生在 PP-StructureV3 內部的 Layout Detection Model,這是 PaddleOCR 函式庫的限制,無法從外部修復。但 Raw OCR 的 `text_regions` 資料仍然完整可用。
|
||||
|
||||
### Stakeholders
|
||||
- **End users**: 需要完整的 OCR 輸出,不能有大量文字遺失
|
||||
- **OCR track**: 需要整合 Raw OCR 與 PP-StructureV3 結果
|
||||
- **Direct/Hybrid track**: 不應受此變更影響
|
||||
|
||||
## Goals / Non-Goals
|
||||
|
||||
### Goals
|
||||
- 偵測 PP-StructureV3 漏檢區域並以 Raw OCR 結果補回
|
||||
- 確保補回的文字不會與現有元素重複
|
||||
- 維持正確的閱讀順序
|
||||
- 僅影響 OCR track,不改變其他 track 的行為
|
||||
|
||||
### Non-Goals
|
||||
- 不修改 PP-StructureV3 或 PaddleOCR 內部邏輯
|
||||
- 不處理圖片/表格/圖表等非文字元素的補漏
|
||||
- 不實作複雜的版面分析(僅做 gap filling)
|
||||
|
||||
## Decisions
|
||||
|
||||
### Decision 1: 覆蓋判定策略
|
||||
**選擇**: 優先使用「中心點落入」判定,輔以 IoU 閾值
|
||||
|
||||
**理由**:
|
||||
- 中心點判定計算簡單,效能好
|
||||
- IoU 閾值作為補充,處理邊界情況
|
||||
- 建議 IoU 閾值 0.1~0.2,避免低 IoU 被誤判為未覆蓋
|
||||
|
||||
**替代方案**:
|
||||
- 純 IoU 判定:計算量較大,且對部分重疊的處理較複雜
|
||||
- 面積比例判定:對不同大小的區域不夠公平
|
||||
|
||||
### Decision 2: 補漏觸發條件
|
||||
**選擇**: 當 PP-Structure 覆蓋率 < 70% 或元素數顯著低於 Raw OCR
|
||||
|
||||
**理由**:
|
||||
- 避免正常文件出現重複文字
|
||||
- 70% 閾值經驗值,可透過設定調整
|
||||
- 元素數比較作為快速判斷條件
|
||||
|
||||
### Decision 3: 補漏元素類型
|
||||
**選擇**: 僅補 TEXT 類型,跳過 TABLE/IMAGE/FIGURE/FLOWCHART/HEADER/FOOTER
|
||||
|
||||
**理由**:
|
||||
- PP-StructureV3 對結構化元素(表格、圖片)的識別通常較準確
|
||||
- 補回原始 OCR 文字可能破壞表格結構
|
||||
- 這些元素需要保持結構完整性
|
||||
|
||||
### Decision 4: 重複判定與去重
|
||||
**選擇**: IoU > 0.5 的 Raw OCR 區域視為與 PP-Structure TEXT 重複,跳過
|
||||
|
||||
**理由**:
|
||||
- 0.5 是常見的重疊閾值
|
||||
- 避免同一文字出現兩次
|
||||
- 對細碎的 Raw OCR 框可考慮輕量合併
|
||||
|
||||
### Decision 5: 座標對齊
|
||||
**選擇**: 使用 `ocr_dimensions` 進行 bbox 換算
|
||||
|
||||
**理由**:
|
||||
- OCR 可能有 resize 處理
|
||||
- 確保 Raw OCR 與 PP-Structure 的座標在同一空間
|
||||
- 避免因尺寸不一致導致覆蓋誤判
|
||||
|
||||
## Data Flow
|
||||
|
||||
```
|
||||
┌─────────────────┐ ┌──────────────────────┐
|
||||
│ Raw OCR Result │ │ PP-StructureV3 Result│
|
||||
│ (56 regions) │ │ (9 elements) │
|
||||
└────────┬────────┘ └──────────┬───────────┘
|
||||
│ │
|
||||
└────────────┬────────────┘
|
||||
│
|
||||
┌───────▼───────┐
|
||||
│ GapFillingService │
|
||||
│ 1. Calculate coverage
|
||||
│ 2. Find uncovered regions
|
||||
│ 3. Filter by confidence
|
||||
│ 4. Deduplicate
|
||||
│ 5. Merge if needed
|
||||
└───────┬───────┘
|
||||
│
|
||||
┌───────▼───────┐
|
||||
│ OCRToUnifiedConverter │
|
||||
│ - Combine elements
|
||||
│ - Recalculate reading order
|
||||
└───────┬───────┘
|
||||
│
|
||||
┌───────▼───────┐
|
||||
│ UnifiedDocument │
|
||||
│ (complete content)
|
||||
└───────────────┘
|
||||
```
|
||||
|
||||
## Algorithm: Gap Detection
|
||||
|
||||
```python
|
||||
def find_uncovered_regions(
|
||||
raw_ocr_regions: List[TextRegion],
|
||||
pp_structure_elements: List[Element],
|
||||
iou_threshold: float = 0.15
|
||||
) -> List[TextRegion]:
|
||||
"""
|
||||
Find Raw OCR regions not covered by PP-Structure elements.
|
||||
|
||||
Coverage criteria (either one):
|
||||
1. Center point of raw region falls inside any PP-Structure bbox
|
||||
2. IoU with any PP-Structure bbox > iou_threshold
|
||||
"""
|
||||
uncovered = []
|
||||
|
||||
# Filter PP-Structure elements: only consider TEXT, skip TABLE/IMAGE/etc.
|
||||
text_elements = [e for e in pp_structure_elements
|
||||
if e.type not in SKIP_TYPES]
|
||||
|
||||
for region in raw_ocr_regions:
|
||||
center = get_center(region.bbox)
|
||||
is_covered = False
|
||||
|
||||
for element in text_elements:
|
||||
# Check center point
|
||||
if point_in_bbox(center, element.bbox):
|
||||
is_covered = True
|
||||
break
|
||||
|
||||
# Check IoU
|
||||
if calculate_iou(region.bbox, element.bbox) > iou_threshold:
|
||||
is_covered = True
|
||||
break
|
||||
|
||||
if not is_covered:
|
||||
uncovered.append(region)
|
||||
|
||||
return uncovered
|
||||
```
|
||||
|
||||
## Configuration Parameters
|
||||
|
||||
| Parameter | Type | Default | Description |
|
||||
|-----------|------|---------|-------------|
|
||||
| `gap_filling_enabled` | bool | True | 是否啟用 gap filling |
|
||||
| `gap_filling_coverage_threshold` | float | 0.7 | 覆蓋率低於此值時啟用 |
|
||||
| `gap_filling_iou_threshold` | float | 0.15 | 覆蓋判定 IoU 閾值 |
|
||||
| `gap_filling_confidence_threshold` | float | 0.3 | Raw OCR 信心度門檻 |
|
||||
| `gap_filling_dedup_iou_threshold` | float | 0.5 | 去重 IoU 閾值 |
|
||||
|
||||
## Risks / Trade-offs
|
||||
|
||||
### Risk 1: 補漏造成文字重複
|
||||
**Mitigation**: 設定 dedup_iou_threshold,對高重疊區域進行去重
|
||||
|
||||
### Risk 2: 閱讀順序錯亂
|
||||
**Mitigation**: 補回元素後重新計算整頁的 reading_order(依 y0, x0 排序)
|
||||
|
||||
### Risk 3: 效能影響
|
||||
**Mitigation**:
|
||||
- 先做快速的覆蓋率檢查,若 > 70% 則跳過 gap filling
|
||||
- 使用 R-tree 或 interval tree 加速 bbox 查詢(若效能成為瓶頸)
|
||||
|
||||
### Risk 4: 座標不對齊
|
||||
**Mitigation**: 使用 `ocr_dimensions` 確保座標空間一致
|
||||
|
||||
## Migration Plan
|
||||
|
||||
1. 新增功能為可選(預設啟用)
|
||||
2. 可透過設定關閉 gap filling
|
||||
3. 不影響現有 API 介面
|
||||
4. 向後相容:不傳參數時使用預設行為
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. 是否需要 UI 開關讓使用者選擇啟用/停用 gap filling?
|
||||
2. 對於細碎的 Raw OCR 框,是否需要實作合併邏輯?(同行、相鄰且間距很小)
|
||||
3. 是否需要在輸出中標記哪些元素是補漏來的?(debug 用途)
|
||||
@@ -0,0 +1,30 @@
|
||||
# Change: Add OCR Track Gap Filling with Raw OCR Text Regions
|
||||
|
||||
## Why
|
||||
|
||||
PP-StructureV3 的版面分析模型在處理某些掃描文件時會嚴重漏檢,導致大量文字內容遺失。實測 scan.pdf 顯示:
|
||||
- Raw PaddleOCR 文字識別:偵測到 **56 個文字區域**
|
||||
- PP-StructureV3 版面分析:僅輸出 **9 個元素**
|
||||
- 遺失比例:約 **84%** 的內容未被 PP-StructureV3 識別
|
||||
|
||||
問題根源在於 PP-StructureV3 內部的 Layout Detection Model 對掃描文件類型支援不足,而非我們的程式碼問題。Raw OCR 能正確偵測所有文字區域,但這些資訊在 PP-StructureV3 的結構化處理過程中被遺失。
|
||||
|
||||
## What Changes
|
||||
|
||||
實作「混合式處理」(Hybrid Approach):使用 Raw OCR 的文字區域來補充 PP-StructureV3 遺失的內容。
|
||||
|
||||
- **新增** `GapFillingService` 類別,負責偵測並補回 PP-StructureV3 遺漏的文字區域
|
||||
- **新增** 覆蓋率計算邏輯(中心點落入或 IoU 閾值判斷)
|
||||
- **新增** 自動啟用條件:當 PP-Structure 覆蓋率 < 70% 或元素數顯著低於 Raw OCR 框數
|
||||
- **修改** `OCRToUnifiedConverter` 整合 gap filling 邏輯
|
||||
- **新增** 重新計算 reading_order 邏輯(依 y0, x0 排序)
|
||||
- **新增** 測試案例:PP-Structure 嚴重漏檢案例、無漏檢正常文件驗證
|
||||
|
||||
## Impact
|
||||
|
||||
- **Affected specs**: `ocr-processing`
|
||||
- **Affected code**:
|
||||
- `backend/app/services/ocr_to_unified_converter.py` - 整合 gap filling
|
||||
- `backend/app/services/gap_filling_service.py` - 新增 (核心邏輯)
|
||||
- `backend/tests/test_gap_filling.py` - 新增 (測試)
|
||||
- **Track isolation**: 僅作用於 OCR track;Direct/Hybrid track 不受影響
|
||||
@@ -0,0 +1,111 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: OCR Track Gap Filling with Raw OCR Regions
|
||||
|
||||
The system SHALL detect and fill gaps in PP-StructureV3 output by supplementing with Raw OCR text regions when significant content loss is detected.
|
||||
|
||||
#### Scenario: Gap filling activates when coverage is low
|
||||
- **GIVEN** an OCR track processing task
|
||||
- **WHEN** PP-StructureV3 outputs elements that cover less than 70% of Raw OCR text regions
|
||||
- **THEN** the system SHALL activate gap filling
|
||||
- **AND** identify Raw OCR regions not covered by any PP-StructureV3 element
|
||||
- **AND** supplement these regions as TEXT elements in the output
|
||||
|
||||
#### Scenario: Coverage is determined by center-point and IoU
|
||||
- **GIVEN** a Raw OCR text region with bounding box
|
||||
- **WHEN** checking if the region is covered by PP-StructureV3
|
||||
- **THEN** the region SHALL be considered covered if its center point falls inside any PP-StructureV3 element bbox
|
||||
- **OR** if IoU with any PP-StructureV3 element exceeds 0.15 threshold
|
||||
- **AND** regions not meeting either criterion SHALL be marked as uncovered
|
||||
|
||||
#### Scenario: Only TEXT elements are supplemented
|
||||
- **GIVEN** uncovered Raw OCR regions identified for supplementation
|
||||
- **WHEN** PP-StructureV3 has detected TABLE, IMAGE, FIGURE, FLOWCHART, HEADER, or FOOTER elements
|
||||
- **THEN** the system SHALL NOT supplement regions that overlap with these structural elements
|
||||
- **AND** only supplement regions as TEXT type to preserve structural integrity
|
||||
|
||||
#### Scenario: Supplemented regions meet confidence threshold
|
||||
- **GIVEN** Raw OCR regions to be supplemented
|
||||
- **WHEN** a region has confidence score below 0.3
|
||||
- **THEN** the system SHALL skip that region
|
||||
- **AND** only supplement regions with confidence >= 0.3
|
||||
|
||||
#### Scenario: Deduplication prevents repeated text
|
||||
- **GIVEN** a Raw OCR region being considered for supplementation
|
||||
- **WHEN** the region has IoU > 0.5 with any existing PP-StructureV3 TEXT element
|
||||
- **THEN** the system SHALL skip that region to prevent duplicate text
|
||||
- **AND** the original PP-StructureV3 element SHALL be preserved
|
||||
|
||||
#### Scenario: Reading order is recalculated after gap filling
|
||||
- **GIVEN** supplemented elements have been added to the page
|
||||
- **WHEN** assembling the final element list
|
||||
- **THEN** the system SHALL recalculate reading order for the entire page
|
||||
- **AND** sort elements by y0 coordinate (top to bottom) then x0 (left to right)
|
||||
- **AND** ensure logical document flow is maintained
|
||||
|
||||
#### Scenario: Coordinate alignment with ocr_dimensions
|
||||
- **GIVEN** Raw OCR processing may involve image resizing
|
||||
- **WHEN** comparing Raw OCR bbox with PP-StructureV3 bbox
|
||||
- **THEN** the system SHALL use ocr_dimensions to normalize coordinates
|
||||
- **AND** ensure both sources reference the same coordinate space
|
||||
- **AND** prevent coverage misdetection due to scale differences
|
||||
|
||||
#### Scenario: Supplemented elements have complete metadata
|
||||
- **GIVEN** a Raw OCR region being added as supplemented element
|
||||
- **WHEN** creating the DocumentElement
|
||||
- **THEN** the element SHALL include page_number
|
||||
- **AND** include confidence score from Raw OCR
|
||||
- **AND** include original bbox coordinates
|
||||
- **AND** optionally include source indicator for debugging
|
||||
|
||||
### Requirement: Gap Filling Track Isolation
|
||||
|
||||
The gap filling feature SHALL only apply to OCR track processing and SHALL NOT affect Direct or Hybrid track outputs.
|
||||
|
||||
#### Scenario: Gap filling only activates for OCR track
|
||||
- **GIVEN** a document processing task
|
||||
- **WHEN** the processing track is OCR
|
||||
- **THEN** the system SHALL evaluate and apply gap filling as needed
|
||||
- **AND** produce enhanced output with supplemented content
|
||||
|
||||
#### Scenario: Direct track is unaffected
|
||||
- **GIVEN** a document processing task with Direct track
|
||||
- **WHEN** the task is processed
|
||||
- **THEN** the system SHALL NOT invoke any gap filling logic
|
||||
- **AND** produce output identical to current Direct track behavior
|
||||
|
||||
#### Scenario: Hybrid track is unaffected
|
||||
- **GIVEN** a document processing task with Hybrid track
|
||||
- **WHEN** the task is processed
|
||||
- **THEN** the system SHALL NOT invoke gap filling logic
|
||||
- **AND** use existing Hybrid track processing pipeline
|
||||
|
||||
### Requirement: Gap Filling Configuration
|
||||
|
||||
The system SHALL provide configurable parameters for gap filling behavior.
|
||||
|
||||
#### Scenario: Gap filling can be disabled via configuration
|
||||
- **GIVEN** gap_filling_enabled is set to false in configuration
|
||||
- **WHEN** OCR track processing runs
|
||||
- **THEN** the system SHALL skip all gap filling logic
|
||||
- **AND** output only PP-StructureV3 results as before
|
||||
|
||||
#### Scenario: Coverage threshold is configurable
|
||||
- **GIVEN** gap_filling_coverage_threshold is set to 0.8
|
||||
- **WHEN** PP-StructureV3 coverage is 75%
|
||||
- **THEN** the system SHALL activate gap filling
|
||||
- **AND** supplement uncovered regions
|
||||
|
||||
#### Scenario: IoU thresholds are configurable
|
||||
- **GIVEN** custom IoU thresholds configured:
|
||||
- gap_filling_iou_threshold: 0.2
|
||||
- gap_filling_dedup_iou_threshold: 0.6
|
||||
- **WHEN** evaluating coverage and deduplication
|
||||
- **THEN** the system SHALL use the configured values
|
||||
- **AND** apply them consistently throughout gap filling process
|
||||
|
||||
#### Scenario: Confidence threshold is configurable
|
||||
- **GIVEN** gap_filling_confidence_threshold is set to 0.5
|
||||
- **WHEN** supplementing Raw OCR regions
|
||||
- **THEN** the system SHALL only include regions with confidence >= 0.5
|
||||
- **AND** filter out lower confidence regions
|
||||
@@ -0,0 +1,44 @@
|
||||
# Tasks: Add OCR Track Gap Filling
|
||||
|
||||
## 1. Core Implementation
|
||||
|
||||
- [x] 1.1 Create `gap_filling_service.py` with `GapFillingService` class
|
||||
- [x] 1.2 Implement bbox coverage calculation (center-point and IoU methods)
|
||||
- [x] 1.3 Implement gap detection logic (find uncovered raw OCR regions)
|
||||
- [x] 1.4 Implement confidence threshold filtering for supplemented regions
|
||||
- [x] 1.5 Implement element type filtering (only supplement TEXT, skip TABLE/IMAGE/FIGURE/etc.)
|
||||
- [x] 1.6 Implement reading order recalculation (sort by y0, x0)
|
||||
- [x] 1.7 Implement deduplication logic (skip high IoU overlaps with PP-Structure TEXT)
|
||||
- [x] 1.8 Implement optional text merging for fragmented adjacent regions
|
||||
|
||||
## 2. Integration
|
||||
|
||||
- [x] 2.1 Modify `OCRToUnifiedConverter` to accept raw OCR text_regions
|
||||
- [x] 2.2 Add gap filling activation condition check (coverage < 70% or element count disparity)
|
||||
- [x] 2.3 Ensure coordinate alignment between raw OCR and PP-Structure (ocr_dimensions handling)
|
||||
- [x] 2.4 Add page metadata (page_number, confidence, bbox) to supplemented elements
|
||||
- [x] 2.5 Ensure track isolation (only OCR track, not Direct/Hybrid)
|
||||
|
||||
## 3. Configuration
|
||||
|
||||
- [x] 3.1 Add configurable parameters to settings:
|
||||
- `gap_filling_enabled`: bool (default: True)
|
||||
- `gap_filling_coverage_threshold`: float (default: 0.7)
|
||||
- `gap_filling_iou_threshold`: float (default: 0.15)
|
||||
- `gap_filling_confidence_threshold`: float (default: 0.3)
|
||||
- `gap_filling_dedup_iou_threshold`: float (default: 0.5)
|
||||
|
||||
## 4. Testing(with env)
|
||||
|
||||
- [x] 4.1 Create test fixtures with PP-Structure severe miss-detection case(with scan.pdf / scan2.pdf)
|
||||
- [x] 4.2 Test gap detection correctly identifies uncovered regions
|
||||
- [x] 4.3 Test supplemented elements have correct metadata
|
||||
- [x] 4.4 Test reading order is correctly recalculated
|
||||
- [x] 4.5 Test deduplication prevents duplicate text
|
||||
- [x] 4.6 Test normal document without miss-detection has no duplicate/inflation
|
||||
- [x] 4.7 Test track isolation (Direct track unaffected)
|
||||
|
||||
## 5. Documentation
|
||||
|
||||
- [x] 5.1 Add inline documentation to GapFillingService
|
||||
- [x] 5.2 Update configuration documentation with new settings
|
||||
@@ -0,0 +1,108 @@
|
||||
# Fix OCR Track Table Rendering
|
||||
|
||||
## Summary
|
||||
|
||||
OCR track PDF generation produces tables with incorrect format and layout. Tables appear without proper structure - cell content is misaligned and the visual format differs significantly from the original document. Image placement is correct, but table rendering is broken.
|
||||
|
||||
## Problem Statement
|
||||
|
||||
When generating PDF from OCR track results (via `scan.pdf` processed by PP-StructureV3), the output tables have:
|
||||
1. **Wrong cell alignment** - content not positioned in proper cells
|
||||
2. **Missing table structure** - rows/columns don't match original document layout
|
||||
3. **Incorrect content distribution** - all content seems to flow linearly instead of maintaining grid structure
|
||||
|
||||
Reference: `backend/storage/results/af7c9ee8-60a0-4291-9f22-ef98d27eed52/`
|
||||
- Original: `af7c9ee8-60a0-4291-9f22-ef98d27eed52_scan_page_1.png`
|
||||
- Generated: `scan_layout.pdf`
|
||||
- Result JSON: `scan_result.json` - Tables have correct `{rows, cols, cells}` structure
|
||||
|
||||
## Root Cause Analysis
|
||||
|
||||
### Issue 1: Table Content Not Converted to TableData Object
|
||||
|
||||
In `_json_to_document_element` (pdf_generator_service.py:1952):
|
||||
```python
|
||||
element = DocumentElement(
|
||||
...
|
||||
content=elem_dict.get('content', ''), # Raw dict, not TableData
|
||||
...
|
||||
)
|
||||
```
|
||||
|
||||
Table elements have `content` as a dict `{rows: 5, cols: 4, cells: [...]}` but it's not converted to a `TableData` object.
|
||||
|
||||
### Issue 2: OCR Track HTML Conversion Fails
|
||||
|
||||
In `convert_unified_document_to_ocr_data` (pdf_generator_service.py:464-467):
|
||||
```python
|
||||
elif isinstance(element.content, dict):
|
||||
html_content = element.content.get('html', str(element.content))
|
||||
```
|
||||
|
||||
Since there's no 'html' key in the cells-based dict, it falls back to `str(element.content)` = `"{'rows': 5, 'cols': 4, ...}"` - invalid HTML.
|
||||
|
||||
### Issue 3: Different Table Rendering Paths
|
||||
|
||||
- **Direct track** uses `_draw_table_element_direct` which properly handles dict with cells via `_build_rows_from_cells_dict`
|
||||
- **OCR track** uses `draw_table_region` which expects HTML strings and fails with dict content
|
||||
|
||||
## Proposed Solution
|
||||
|
||||
### Option A: Convert dict to TableData during JSON loading (Recommended)
|
||||
|
||||
In `_json_to_document_element`, when element type is TABLE and content is a dict with cells, convert it to a `TableData` object:
|
||||
|
||||
```python
|
||||
# For TABLE elements, convert dict to TableData
|
||||
if elem_type == ElementType.TABLE and isinstance(content, dict) and 'cells' in content:
|
||||
content = self._dict_to_table_data(content)
|
||||
```
|
||||
|
||||
This ensures `element.content.to_html()` works correctly in `convert_unified_document_to_ocr_data`.
|
||||
|
||||
### Option B: Fix conversion in convert_unified_document_to_ocr_data
|
||||
|
||||
Handle dict with cells properly by converting to HTML:
|
||||
|
||||
```python
|
||||
elif isinstance(element.content, dict):
|
||||
if 'cells' in element.content:
|
||||
# Convert cells-based dict to HTML
|
||||
html_content = self._cells_dict_to_html(element.content)
|
||||
elif 'html' in element.content:
|
||||
html_content = element.content['html']
|
||||
else:
|
||||
html_content = str(element.content)
|
||||
```
|
||||
|
||||
## Impact on Hybrid Mode
|
||||
|
||||
Hybrid mode uses Direct track rendering (`_generate_direct_track_pdf`) which already handles dict content properly via `_build_rows_from_cells_dict`. The proposed fixes should not affect hybrid mode negatively.
|
||||
|
||||
However, testing should verify:
|
||||
1. Hybrid mode continues to work with combined Direct + OCR elements
|
||||
2. Table rendering quality is consistent across all tracks
|
||||
|
||||
## Success Criteria
|
||||
|
||||
1. OCR track tables render with correct structure matching original document
|
||||
2. Cell content positioned in proper grid locations
|
||||
3. Table borders/grid lines visible
|
||||
4. No regression in Direct track or Hybrid mode table rendering
|
||||
5. All test files (scan.pdf, img1.png, img2.png, img3.png) produce correct output
|
||||
|
||||
## Files to Modify
|
||||
|
||||
1. `backend/app/services/pdf_generator_service.py`
|
||||
- `_json_to_document_element`: Convert table dict to TableData
|
||||
- `convert_unified_document_to_ocr_data`: Improve dict handling (if Option B)
|
||||
|
||||
2. `backend/app/models/unified_document.py` (optional)
|
||||
- Add `TableData.from_dict()` class method for cleaner conversion
|
||||
|
||||
## Testing Plan
|
||||
|
||||
1. Test scan.pdf with OCR track - verify table structure matches original
|
||||
2. Test img1.png, img2.png, img3.png with OCR track
|
||||
3. Test PDF files with Direct track - verify no regression
|
||||
4. Test Hybrid mode with files that trigger OCR fallback
|
||||
@@ -0,0 +1,52 @@
|
||||
# PDF Generation - OCR Track Table Rendering Fix
|
||||
|
||||
## MODIFIED Requirements
|
||||
|
||||
### Requirement: OCR Track Table Content Conversion
|
||||
|
||||
The PDF generator MUST properly convert table content from JSON dict format to renderable structure when processing OCR track results.
|
||||
|
||||
#### Scenario: Table dict with cells array converts to proper HTML
|
||||
|
||||
Given an OCR track JSON with table element containing rows, cols, and cells array
|
||||
When the PDF generator processes this element
|
||||
Then the table content MUST be converted to a TableData object
|
||||
And TableData.to_html() MUST produce valid HTML with proper tr/td structure
|
||||
And the generated PDF table MUST have cells positioned in correct grid locations
|
||||
|
||||
#### Scenario: Table with rowspan/colspan renders correctly
|
||||
|
||||
Given a table element with cells having rowspan > 1 or colspan > 1
|
||||
When the PDF generator renders the table
|
||||
Then merged cells MUST span the correct number of rows/columns
|
||||
And content MUST appear in the merged cell position
|
||||
|
||||
### Requirement: Table Visual Fidelity
|
||||
|
||||
The PDF generator MUST render OCR track tables with visual structure matching the original document.
|
||||
|
||||
#### Scenario: Table renders with grid lines
|
||||
|
||||
Given an OCR track table element
|
||||
When rendered to PDF
|
||||
Then the table MUST have visible grid lines/borders
|
||||
And cell boundaries MUST be clearly defined
|
||||
|
||||
#### Scenario: Table text alignment preserved
|
||||
|
||||
Given an OCR track table with cell content
|
||||
When rendered to PDF
|
||||
Then text MUST be positioned within the correct cell boundaries
|
||||
And text MUST NOT overflow into adjacent cells
|
||||
|
||||
### Requirement: Backward Compatibility with Hybrid Mode
|
||||
|
||||
The table rendering fix MUST NOT break hybrid mode processing.
|
||||
|
||||
#### Scenario: Hybrid mode tables render correctly
|
||||
|
||||
Given a document processed with hybrid mode combining Direct and OCR tracks
|
||||
When PDF is generated
|
||||
Then Direct track tables MUST render with existing quality
|
||||
And OCR track tables MUST render with improved quality
|
||||
And no regression in table positioning or content
|
||||
@@ -0,0 +1,55 @@
|
||||
# Implementation Tasks
|
||||
|
||||
## Phase 1: Core Fix - Table Content Conversion
|
||||
|
||||
### 1.1 Add TableData.from_dict() class method
|
||||
- [ ] In `unified_document.py`, add `from_dict()` method to `TableData` class
|
||||
- [ ] Handle conversion of cells list (list of dicts) to `TableCell` objects
|
||||
- [ ] Preserve rows, cols, headers, caption fields
|
||||
|
||||
### 1.2 Fix _json_to_document_element for TABLE elements
|
||||
- [ ] In `pdf_generator_service.py`, modify `_json_to_document_element`
|
||||
- [ ] When `elem_type == ElementType.TABLE` and content is dict with 'cells', convert to `TableData`
|
||||
- [ ] Use `TableData.from_dict()` for clean conversion
|
||||
|
||||
### 1.3 Verify TableData.to_html() generates correct HTML
|
||||
- [ ] Test that `to_html()` produces parseable HTML with proper row/cell structure
|
||||
- [ ] Verify colspan/rowspan attributes are correctly generated
|
||||
- [ ] Ensure empty cells are properly handled
|
||||
|
||||
## Phase 2: OCR Track Rendering Consistency
|
||||
|
||||
### 2.1 Review convert_unified_document_to_ocr_data
|
||||
- [ ] Verify TableData objects are properly converted to HTML
|
||||
- [ ] Add fallback handling for dict content with 'cells' key
|
||||
- [ ] Log warning if content cannot be converted to HTML
|
||||
|
||||
### 2.2 Review draw_table_region
|
||||
- [ ] Verify HTMLTableParser correctly parses generated HTML
|
||||
- [ ] Check that ReportLab Table is positioned at correct bbox
|
||||
- [ ] Verify font and style application
|
||||
|
||||
## Phase 3: Testing and Verification
|
||||
|
||||
### 3.1 Test OCR Track
|
||||
- [ ] Test scan.pdf - verify tables have correct structure
|
||||
- [ ] Test img1.png, img2.png, img3.png
|
||||
- [ ] Compare generated PDF with original documents
|
||||
|
||||
### 3.2 Test Direct Track (Regression)
|
||||
- [ ] Test PDF files with Direct track
|
||||
- [ ] Verify table rendering unchanged
|
||||
|
||||
### 3.3 Test Hybrid Mode
|
||||
- [ ] Test files that trigger hybrid processing
|
||||
- [ ] Verify mixed Direct + OCR elements render correctly
|
||||
|
||||
## Phase 4: Code Quality
|
||||
|
||||
### 4.1 Add logging
|
||||
- [ ] Add debug logging for table content type detection
|
||||
- [ ] Log conversion steps for troubleshooting
|
||||
|
||||
### 4.2 Error handling
|
||||
- [ ] Handle malformed cell data gracefully
|
||||
- [ ] Log warnings for unexpected content formats
|
||||
@@ -0,0 +1,40 @@
|
||||
# Change: Simplify PP-StructureV3 Configuration with Layout Model Selection
|
||||
|
||||
## Why
|
||||
|
||||
Current PP-StructureV3 parameter adjustment UI exposes 7 technical ML parameters (thresholds, ratios, merge modes) that are difficult for end users to understand. Meanwhile, switching to a different layout detection model (e.g., CDLA-trained models for Chinese documents) would have a much greater impact on OCR quality than fine-tuning these parameters.
|
||||
|
||||
**Problems with current approach:**
|
||||
- Users don't understand what `layout_detection_threshold` or `text_det_unclip_ratio` mean
|
||||
- Wrong parameter values can make OCR results worse
|
||||
- The default model (PubLayNet-based) is optimized for English academic papers, not Chinese business documents
|
||||
- Model selection is far more impactful than parameter tuning
|
||||
|
||||
## What Changes
|
||||
|
||||
### Backend Changes
|
||||
- **REMOVED**: API parameter `pp_structure_params` from task start endpoint
|
||||
- **ADDED**: New API parameter `layout_model` with predefined options:
|
||||
- `"default"` - Standard model (PubLayNet-based, for English documents)
|
||||
- `"chinese"` - PP-DocLayout-S model (for Chinese documents, forms, contracts)
|
||||
- `"cdla"` - CDLA model (alternative Chinese document layout model)
|
||||
- **MODIFIED**: PP-StructureV3 initialization uses `layout_detection_model_name` based on selection
|
||||
- Keep fine-tuning parameters in backend `config.py` with optimized defaults
|
||||
|
||||
### Frontend Changes
|
||||
- **REMOVED**: `PPStructureParams.tsx` component (slider/dropdown UI for 7 parameters)
|
||||
- **ADDED**: Simple radio button/dropdown for layout model selection with clear descriptions
|
||||
- **MODIFIED**: Task start request body to send `layout_model` instead of `pp_structure_params`
|
||||
|
||||
### API Changes
|
||||
- **BREAKING**: Remove `pp_structure_params` from `POST /api/v2/tasks/{task_id}/start`
|
||||
- **ADDED**: New optional parameter `layout_model: "default" | "chinese" | "cdla"`
|
||||
|
||||
## Impact
|
||||
|
||||
- Affected specs: `ocr-processing`
|
||||
- Affected code:
|
||||
- Backend: `app/routers/tasks.py`, `app/services/ocr_service.py`, `app/core/config.py`
|
||||
- Frontend: `src/components/PPStructureParams.tsx` (remove), `src/types/apiV2.ts`, task start form
|
||||
- Breaking change: Clients using `pp_structure_params` will need to migrate to `layout_model`
|
||||
- User impact: Simpler UI, better default OCR quality for Chinese documents
|
||||
@@ -0,0 +1,86 @@
|
||||
# ocr-processing Specification Delta
|
||||
|
||||
## REMOVED Requirements
|
||||
|
||||
### Requirement: Frontend-Adjustable PP-StructureV3 Parameters
|
||||
**Reason**: Complex ML parameters are difficult for end users to understand and tune. Model selection provides better UX and more significant quality improvements.
|
||||
**Migration**: Replace `pp_structure_params` API parameter with `layout_model` parameter.
|
||||
|
||||
### Requirement: PP-StructureV3 Parameter UI Controls
|
||||
**Reason**: Slider/dropdown UI for 7 technical parameters adds complexity without proportional benefit. Simple model selection is more user-friendly.
|
||||
**Migration**: Remove `PPStructureParams.tsx` component, add `LayoutModelSelector.tsx` component.
|
||||
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Layout Model Selection
|
||||
The system SHALL allow users to select a layout detection model optimized for their document type, providing a simple choice between pre-configured models instead of manual parameter tuning.
|
||||
|
||||
#### Scenario: User selects Chinese document model
|
||||
- **GIVEN** a user is processing Chinese business documents (forms, contracts, invoices)
|
||||
- **WHEN** the user selects "Chinese Document Model" (PP-DocLayout-S)
|
||||
- **THEN** the OCR engine SHALL use the PP-DocLayout-S layout detection model
|
||||
- **AND** the model SHALL be optimized for 23 Chinese document element types
|
||||
- **AND** table and form detection accuracy SHALL be improved over the default model
|
||||
|
||||
#### Scenario: User selects standard model for English documents
|
||||
- **GIVEN** a user is processing English academic papers or reports
|
||||
- **WHEN** the user selects "Standard Model" (PubLayNet-based)
|
||||
- **THEN** the OCR engine SHALL use the default PubLayNet-based layout detection model
|
||||
- **AND** the model SHALL be optimized for English document layouts
|
||||
|
||||
#### Scenario: User selects CDLA model for specialized Chinese layout
|
||||
- **GIVEN** a user is processing Chinese documents with complex layouts
|
||||
- **WHEN** the user selects "CDLA Model"
|
||||
- **THEN** the OCR engine SHALL use the picodet_lcnet_x1_0_fgd_layout_cdla model
|
||||
- **AND** the model SHALL provide specialized Chinese document layout analysis
|
||||
|
||||
#### Scenario: Layout model is sent via API request
|
||||
- **GIVEN** a frontend application with model selection UI
|
||||
- **WHEN** the user starts task processing with a selected model
|
||||
- **THEN** the frontend SHALL send the model choice in the request body:
|
||||
```json
|
||||
POST /api/v2/tasks/{task_id}/start
|
||||
{
|
||||
"use_dual_track": true,
|
||||
"force_track": "ocr",
|
||||
"language": "ch",
|
||||
"layout_model": "chinese"
|
||||
}
|
||||
```
|
||||
- **AND** the backend SHALL configure PP-StructureV3 with the corresponding model
|
||||
|
||||
#### Scenario: Default model when not specified
|
||||
- **GIVEN** an API request without `layout_model` parameter
|
||||
- **WHEN** the task is started
|
||||
- **THEN** the system SHALL use "chinese" (PP-DocLayout-S) as the default model
|
||||
- **AND** processing SHALL work correctly without requiring model selection
|
||||
|
||||
#### Scenario: Invalid model name is rejected
|
||||
- **GIVEN** a request with an invalid `layout_model` value
|
||||
- **WHEN** the user sends `layout_model: "invalid_model"`
|
||||
- **THEN** the API SHALL return 422 Validation Error
|
||||
- **AND** provide a clear error message listing valid model options
|
||||
|
||||
### Requirement: Layout Model Selection UI
|
||||
The frontend SHALL provide a simple, user-friendly interface for selecting layout detection models with clear descriptions of each option.
|
||||
|
||||
#### Scenario: Model options are displayed with descriptions
|
||||
- **GIVEN** the model selection UI is displayed
|
||||
- **WHEN** the user views the available options
|
||||
- **THEN** the UI SHALL show the following options:
|
||||
- "Chinese Document Model (Recommended)" - for Chinese forms, contracts, invoices
|
||||
- "Standard Model" - for English academic papers, reports
|
||||
- "CDLA Model" - for specialized Chinese layout analysis
|
||||
- **AND** each option SHALL have a brief description of its use case
|
||||
|
||||
#### Scenario: Chinese model is selected by default
|
||||
- **GIVEN** the user opens the task processing interface
|
||||
- **WHEN** the model selection is displayed
|
||||
- **THEN** "Chinese Document Model" SHALL be pre-selected as the default
|
||||
- **AND** the user MAY change the selection before starting processing
|
||||
|
||||
#### Scenario: Model selection is visible only for OCR track
|
||||
- **GIVEN** a document processing interface
|
||||
- **WHEN** the user selects processing track
|
||||
- **THEN** layout model selection SHALL be shown ONLY when OCR track is selected or auto-detected
|
||||
- **AND** SHALL be hidden for Direct track (which does not use PP-StructureV3)
|
||||
@@ -0,0 +1,56 @@
|
||||
# Implementation Tasks
|
||||
|
||||
## 1. Backend API Changes
|
||||
|
||||
- [x] 1.1 Update `app/schemas/task.py` to add `layout_model` enum type
|
||||
- [x] 1.2 Update `app/routers/tasks.py` to replace `pp_structure_params` with `layout_model` parameter
|
||||
- [x] 1.3 Update `app/services/ocr_service.py` to map `layout_model` to `layout_detection_model_name`
|
||||
- [x] 1.4 Remove custom PP-Structure engine creation logic (use model selection instead)
|
||||
- [x] 1.5 Add backward compatibility: default to "chinese" if no model specified
|
||||
|
||||
## 2. Backend Configuration
|
||||
|
||||
- [x] 2.1 Keep `layout_detection_model_name` in `config.py` as fallback default
|
||||
- [x] 2.2 Keep fine-tuning parameters in `config.py` (not exposed to API)
|
||||
- [x] 2.3 Document available layout models in config comments
|
||||
|
||||
## 3. Frontend Changes
|
||||
|
||||
- [x] 3.1 Remove `PPStructureParams.tsx` component
|
||||
- [x] 3.2 Update `src/types/apiV2.ts`:
|
||||
- Remove `PPStructureV3Params` interface
|
||||
- Add `LayoutModel` type: `"default" | "chinese" | "cdla"`
|
||||
- Update `ProcessingOptions` to use `layout_model` instead of `pp_structure_params`
|
||||
- [x] 3.3 Create `LayoutModelSelector.tsx` component with:
|
||||
- Radio buttons or dropdown for model selection
|
||||
- Clear descriptions for each model option
|
||||
- Default selection: "chinese"
|
||||
- [x] 3.4 Update task start form to use new `LayoutModelSelector`
|
||||
- [x] 3.5 Update API calls to send `layout_model` instead of `pp_structure_params`
|
||||
|
||||
## 4. Internationalization
|
||||
|
||||
- [x] 4.1 Add i18n strings for layout model options:
|
||||
- `layoutModel.default`: "Standard Model (English documents)"
|
||||
- `layoutModel.chinese`: "Chinese Document Model (Recommended)"
|
||||
- `layoutModel.cdla`: "CDLA Model (Chinese layout analysis)"
|
||||
- [x] 4.2 Add i18n strings for model descriptions
|
||||
|
||||
## 5. Testing
|
||||
|
||||
- [x] 5.1 Create new tests for `layout_model` parameter (`test_layout_model_api.py`, `test_layout_model.py`)
|
||||
- [x] 5.2 Archive tests for `pp_structure_params` validation (moved to `tests/archived/`)
|
||||
- [x] 5.3 Add tests for layout model selection (19 tests passing)
|
||||
- [x] 5.4 Test backward compatibility (no model specified → use chinese default)
|
||||
|
||||
## 6. Documentation
|
||||
|
||||
- [ ] 6.1 Update API documentation for task start endpoint
|
||||
- [ ] 6.2 Remove PP-Structure parameter documentation
|
||||
- [ ] 6.3 Add layout model selection documentation
|
||||
|
||||
## 7. Cleanup
|
||||
|
||||
- [x] 7.1 Remove localStorage keys for PP-Structure params (`pp_structure_params_presets`, `pp_structure_params_last_used`)
|
||||
- [x] 7.2 Remove any unused imports/types related to PP-Structure params
|
||||
- [x] 7.3 Archive old PP-Structure params test files
|
||||
Reference in New Issue
Block a user