egg/OCR

Files

egg b048f2d640 fix: disable chart recognition due to PaddlePaddle 3.0.0 API limitation

PaddleOCR-VL chart recognition model requires `fused_rms_norm_ext` API
which is not available in PaddlePaddle 3.0.0 stable release.

Changes:
- Set use_chart_recognition=False in PP-StructureV3 initialization
- Remove unsupported show_log parameter from PaddleOCR 3.x API calls
- Document known limitation in openspec proposal
- Add limitation documentation to README
- Update tasks.md with documentation task for known issues

Impact:
- Layout analysis still detects/extracts charts as images ✓
- Tables, formulas, and text recognition work normally ✓
- Deep chart understanding (type detection, data extraction) disabled ✗
- Chart to structured data conversion disabled ✗

Workaround: Charts saved as image files for manual review

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-14 13:16:17 +08:00

4.0 KiB

Raw Blame History

Change: Add GPU Acceleration Support for OCR Processing

Why

PaddleOCR supports CUDA GPU acceleration which can significantly improve OCR processing speed for batch operations. Currently, the system always uses CPU processing, which is slower and less efficient for large document batches. By adding GPU detection and automatic CUDA support, the system will:

Automatically utilize available GPU hardware when present
Fall back gracefully to CPU processing when GPU is unavailable
Reduce processing time for large batches by leveraging parallel GPU computation
Improve overall system throughput and user experience

What Changes

Add GPU detection logic to environment setup script (setup_dev_env.sh)
Automatically install CUDA-enabled PaddlePaddle when compatible GPU is detected
Install CPU-only PaddlePaddle when no compatible GPU is found
Add GPU availability detection in OCR processing code
Automatically enable GPU acceleration in PaddleOCR when GPU is available
Add configuration option to force CPU mode (for testing or troubleshooting)
Add GPU status reporting in API health check endpoint
Update documentation with GPU requirements and setup instructions

Impact

Affected capabilities:
- ocr-processing: Add GPU acceleration support with automatic detection
- environment-setup: Add GPU detection and CUDA installation logic
Affected code:
- setup_dev_env.sh: GPU detection and conditional CUDA package installation
- backend/app/services/ocr_service.py: GPU availability detection and configuration
- backend/app/api/v1/endpoints/health.py: GPU status reporting
- backend/app/core/config.py: GPU configuration settings
- .env.local: GPU-related environment variables
Dependencies:
- When GPU available: paddlepaddle-gpu (with matching CUDA version)
- When GPU unavailable: paddlepaddle (CPU-only, current default)
- Detection tools: nvidia-smi (NVIDIA GPUs), lspci (hardware detection)
Configuration:
- New env var: FORCE_CPU_MODE (default: false) - Override GPU detection
- New env var: CUDA_VERSION (auto-detected or manual override)
- GPU memory allocation settings for PaddleOCR
- Batch size adjustment based on GPU memory availability
Performance Impact:
- Expected 3-10x speedup for OCR processing on GPU-enabled systems
- No performance degradation on CPU-only systems (same as current behavior)
- Automatic memory management to prevent GPU OOM errors
Backward Compatibility:
- Fully backward compatible - existing CPU-only installations continue to work
- No breaking changes to API or configuration
- Existing installations can opt-in by re-running setup script on GPU-enabled hardware

Known Issues and Limitations

Chart Recognition Feature Disabled (PaddlePaddle 3.0.0 API Limitation)

Issue: Chart recognition feature in PP-StructureV3 is currently disabled due to API incompatibility.

Root Cause:

PaddleOCR-VL chart recognition model requires paddle.incubate.nn.functional.fused_rms_norm_ext API
PaddlePaddle 3.0.0 stable only provides fused_rms_norm (base version)
The extended version fused_rms_norm_ext is not yet available in stable release

Impact:

✅ Still Works: Layout analysis can detect and extract chart/figure regions as images
✅ Still Works: Tables, formulas, and text recognition all function normally
❌ Disabled: Deep chart understanding (chart type detection, data extraction, axis/legend parsing)
❌ Disabled: Converting chart content to structured data (JSON, tables)

Workaround:

Set use_chart_recognition=False in PP-StructureV3 initialization
Charts are saved as image files but content is not analyzed

Future Resolution:

Wait for PaddlePaddle 3.0.x/3.1.x update that adds fused_rms_norm_ext API
Or use PaddlePaddle develop version (unstable, not recommended for production)

Code Location: backend/app/services/ocr_service.py:216

Status: Documented limitation, pending PaddlePaddle framework update

4.0 KiB Raw Blame History