egg/OCR - OCR

egg/OCR

Fork 0

Commit Graph

Author	SHA1	Message	Date
egg	e23aaacd84	fix: resolve OCR track converter data structure mismatch Problem: OCR track was producing empty output files (0 pages, 0 elements) despite successful OCR extraction (27 text regions detected). Root Causes: 1. Converter expected `text_regions` inside `layout_data`, but `process_file_traditional` returns it at top level 2. Converter expected `ocr_dimensions` to be a list, but single-page documents return it as dict `{'width': W, 'height': H}` Solution: - Add `_extract_from_traditional_ocr()` method to handle top-level `text_regions` structure from `process_file_traditional` - Handle both dict (single-page) and list (multi-page) formats for `ocr_dimensions` - Update `_extract_pages()` to check for `text_regions` key before `layout_data` key Verification: - Before: img1.png → 0 pages, 0 elements, 0 characters - After: img1.png → 1 page, 27 elements, 278 characters - Output files now properly generated (JSON: 13KB, MD: 498B, PDF: 23KB) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 17:51:18 +08:00
egg	8b9a364452	feat: add GPU optimization and fix TableData consistency GPU Optimization (Section 3.1): - Add comprehensive memory management for RTX 4060 8GB - Enable all recognition features (chart, formula, table, seal, text) - Implement model cache with auto-unload for idle models - Add memory monitoring and warning system Bug Fix (Section 3.3): - Fix TableData field inconsistency: 'columns' -> 'cols' - Remove invalid 'html' and 'extracted_text' parameters - Add proper TableCell conversion in _convert_table_data Documentation: - Add Future Improvements section for batch processing enhancement 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-19 09:17:27 +08:00
egg	a3a6fbe58b	feat: add OCR to UnifiedDocument converter for PP-StructureV3 integration Implements the converter that transforms PP-StructureV3 OCR results into the UnifiedDocument format, enabling consistent output for both OCR and direct extraction tracks. - Create OCRToUnifiedConverter class with full element type mapping - Handle both enhanced (parsing_res_list) and standard markdown results - Support 4-point and simple bbox formats for coordinates - Establish element relationships (captions, lists, headers) - Integrate converter into OCR service dual-track processing - Update tasks.md marking section 3.3 complete 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-19 08:05:20 +08:00

Author

SHA1

Message

Date

egg

e23aaacd84

fix: resolve OCR track converter data structure mismatch

**Problem**: OCR track was producing empty output files (0 pages, 0 elements)
despite successful OCR extraction (27 text regions detected).

**Root Causes**:
1. Converter expected `text_regions` inside `layout_data`, but
   `process_file_traditional` returns it at top level
2. Converter expected `ocr_dimensions` to be a list, but single-page
   documents return it as dict `{'width': W, 'height': H}`

**Solution**:
- Add `_extract_from_traditional_ocr()` method to handle top-level
  `text_regions` structure from `process_file_traditional`
- Handle both dict (single-page) and list (multi-page) formats for
  `ocr_dimensions`
- Update `_extract_pages()` to check for `text_regions` key before
  `layout_data` key

**Verification**:
- Before: img1.png → 0 pages, 0 elements, 0 characters
- After: img1.png → 1 page, 27 elements, 278 characters
- Output files now properly generated (JSON: 13KB, MD: 498B, PDF: 23KB)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-20 17:51:18 +08:00

egg

8b9a364452

feat: add GPU optimization and fix TableData consistency

GPU Optimization (Section 3.1):
- Add comprehensive memory management for RTX 4060 8GB
- Enable all recognition features (chart, formula, table, seal, text)
- Implement model cache with auto-unload for idle models
- Add memory monitoring and warning system

Bug Fix (Section 3.3):
- Fix TableData field inconsistency: 'columns' -> 'cols'
- Remove invalid 'html' and 'extracted_text' parameters
- Add proper TableCell conversion in _convert_table_data

Documentation:
- Add Future Improvements section for batch processing enhancement

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-19 09:17:27 +08:00

egg

a3a6fbe58b

feat: add OCR to UnifiedDocument converter for PP-StructureV3 integration

Implements the converter that transforms PP-StructureV3 OCR results into
the UnifiedDocument format, enabling consistent output for both OCR and
direct extraction tracks.

- Create OCRToUnifiedConverter class with full element type mapping
- Handle both enhanced (parsing_res_list) and standard markdown results
- Support 4-point and simple bbox formats for coordinates
- Establish element relationships (captions, lists, headers)
- Integrate converter into OCR service dual-track processing
- Update tasks.md marking section 3.3 complete

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-19 08:05:20 +08:00

3 Commits