egg
e23aaacd84
fix: resolve OCR track converter data structure mismatch
...
**Problem**: OCR track was producing empty output files (0 pages, 0 elements)
despite successful OCR extraction (27 text regions detected).
**Root Causes**:
1. Converter expected `text_regions` inside `layout_data`, but
`process_file_traditional` returns it at top level
2. Converter expected `ocr_dimensions` to be a list, but single-page
documents return it as dict `{'width': W, 'height': H}`
**Solution**:
- Add `_extract_from_traditional_ocr()` method to handle top-level
`text_regions` structure from `process_file_traditional`
- Handle both dict (single-page) and list (multi-page) formats for
`ocr_dimensions`
- Update `_extract_pages()` to check for `text_regions` key before
`layout_data` key
**Verification**:
- Before: img1.png → 0 pages, 0 elements, 0 characters
- After: img1.png → 1 page, 27 elements, 278 characters
- Output files now properly generated (JSON: 13KB, MD: 498B, PDF: 23KB)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-20 17:51:18 +08:00
egg
8b9a364452
feat: add GPU optimization and fix TableData consistency
...
GPU Optimization (Section 3.1):
- Add comprehensive memory management for RTX 4060 8GB
- Enable all recognition features (chart, formula, table, seal, text)
- Implement model cache with auto-unload for idle models
- Add memory monitoring and warning system
Bug Fix (Section 3.3):
- Fix TableData field inconsistency: 'columns' -> 'cols'
- Remove invalid 'html' and 'extracted_text' parameters
- Add proper TableCell conversion in _convert_table_data
Documentation:
- Add Future Improvements section for batch processing enhancement
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-19 09:17:27 +08:00
egg
a3a6fbe58b
feat: add OCR to UnifiedDocument converter for PP-StructureV3 integration
...
Implements the converter that transforms PP-StructureV3 OCR results into
the UnifiedDocument format, enabling consistent output for both OCR and
direct extraction tracks.
- Create OCRToUnifiedConverter class with full element type mapping
- Handle both enhanced (parsing_res_list) and standard markdown results
- Support 4-point and simple bbox formats for coordinates
- Establish element relationships (captions, lists, headers)
- Integrate converter into OCR service dual-track processing
- Update tasks.md marking section 3.3 complete
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-19 08:05:20 +08:00