egg/OCR - OCR

egg/OCR

Fork 0

Commit Graph

Author	SHA1	Message	Date
egg	0aff468c51	feat: implement Phase 1 of PDF layout restoration Implement critical fixes for image and table rendering in PDF generation. Image Handling Fixes: - Implemented _save_image() in pp_structure_enhanced.py - Creates imgs/ subdirectory for saved images - Handles both file paths and numpy arrays - Returns relative path for reference - Adds proper error handling and logging - Added saved_path field to image elements for path tracking - Created _get_image_path() helper with fallback logic - Checks saved_path, path, image_path in content - Falls back to metadata fields - Logs warnings for missing paths Table Rendering Fixes: - Fixed table rendering to use element's own bbox directly - No longer depends on fake table_.png references - Supports both bbox and bbox_polygon formats - Inline conversion for different bbox formats - Maintains backward compatibility with legacy approach - Improved error handling for missing bbox data Status*: - Phase 1 tasks 1.1 and 1.2: ✅ Completed - Phase 1 tasks 2.1, 2.2, and 2.3: ✅ Completed - Testing pending due to backend availability These fixes resolve the critical issues where images never appeared and tables never rendered in generated PDFs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-24 07:16:31 +08:00
egg	b997f9355a	fix: make torch import optional and add PaddlePaddle GPU memory management Problem: - Backend failed to start with ModuleNotFoundError for torch module - torch was imported as hard dependency but not in requirements.txt - Project uses PaddlePaddle which has its own CUDA implementation Changes: - Make torch import optional with try/except in ocr_service.py - Make torch import optional in pp_structure_enhanced.py - Add cleanup_gpu_memory() method using PaddlePaddle's memory management - Add check_gpu_memory() method to monitor available GPU memory - Use paddle.device.cuda.empty_cache() for GPU cleanup - Use torch.cuda only if TORCH_AVAILABLE flag is True - Add cleanup calls after OCR processing to prevent OOM errors - Add memory checks before GPU-intensive operations Benefits: - Backend can start without torch installed - GPU memory is properly managed using PaddlePaddle - Optional torch support provides additional memory monitoring - Prevents GPU OOM errors during document processing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 16:40:44 +08:00
egg	a3a6fbe58b	feat: add OCR to UnifiedDocument converter for PP-StructureV3 integration Implements the converter that transforms PP-StructureV3 OCR results into the UnifiedDocument format, enabling consistent output for both OCR and direct extraction tracks. - Create OCRToUnifiedConverter class with full element type mapping - Handle both enhanced (parsing_res_list) and standard markdown results - Support 4-point and simple bbox formats for coordinates - Establish element relationships (captions, lists, headers) - Integrate converter into OCR service dual-track processing - Update tasks.md marking section 3.3 complete 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-19 08:05:20 +08:00

Author

SHA1

Message

Date

egg

0aff468c51

feat: implement Phase 1 of PDF layout restoration

Implement critical fixes for image and table rendering in PDF generation.

**Image Handling Fixes**:
- Implemented _save_image() in pp_structure_enhanced.py
  - Creates imgs/ subdirectory for saved images
  - Handles both file paths and numpy arrays
  - Returns relative path for reference
  - Adds proper error handling and logging
- Added saved_path field to image elements for path tracking
- Created _get_image_path() helper with fallback logic
  - Checks saved_path, path, image_path in content
  - Falls back to metadata fields
  - Logs warnings for missing paths

**Table Rendering Fixes**:
- Fixed table rendering to use element's own bbox directly
  - No longer depends on fake table_*.png references
  - Supports both bbox and bbox_polygon formats
  - Inline conversion for different bbox formats
- Maintains backward compatibility with legacy approach
- Improved error handling for missing bbox data

**Status**:
- Phase 1 tasks 1.1 and 1.2: ✅ Completed
- Phase 1 tasks 2.1, 2.2, and 2.3: ✅ Completed
- Testing pending due to backend availability

These fixes resolve the critical issues where images never appeared
and tables never rendered in generated PDFs.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-24 07:16:31 +08:00

egg

b997f9355a

fix: make torch import optional and add PaddlePaddle GPU memory management

Problem:
- Backend failed to start with ModuleNotFoundError for torch module
- torch was imported as hard dependency but not in requirements.txt
- Project uses PaddlePaddle which has its own CUDA implementation

Changes:
- Make torch import optional with try/except in ocr_service.py
- Make torch import optional in pp_structure_enhanced.py
- Add cleanup_gpu_memory() method using PaddlePaddle's memory management
- Add check_gpu_memory() method to monitor available GPU memory
- Use paddle.device.cuda.empty_cache() for GPU cleanup
- Use torch.cuda only if TORCH_AVAILABLE flag is True
- Add cleanup calls after OCR processing to prevent OOM errors
- Add memory checks before GPU-intensive operations

Benefits:
- Backend can start without torch installed
- GPU memory is properly managed using PaddlePaddle
- Optional torch support provides additional memory monitoring
- Prevents GPU OOM errors during document processing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-20 16:40:44 +08:00

egg

a3a6fbe58b

feat: add OCR to UnifiedDocument converter for PP-StructureV3 integration

Implements the converter that transforms PP-StructureV3 OCR results into
the UnifiedDocument format, enabling consistent output for both OCR and
direct extraction tracks.

- Create OCRToUnifiedConverter class with full element type mapping
- Handle both enhanced (parsing_res_list) and standard markdown results
- Support 4-point and simple bbox formats for coordinates
- Establish element relationships (captions, lists, headers)
- Integrate converter into OCR service dual-track processing
- Update tasks.md marking section 3.3 complete

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-19 08:05:20 +08:00

3 Commits