docs: clarify chart recognition limitation and provide verification tool
Chart Recognition Status Investigation: - OpenSpec limitation record is ACCURATE but based on old PaddlePaddle 3.0.0 (Mar 2025) - PaddlePaddle has released multiple updates (3.1.x, 3.2.x, latest: 3.2.2 Nov 2025) - The fused_rms_norm_ext API MAY now be available in newer versions Root Cause: - PaddleOCR-VL chart recognition requires paddle.incubate.nn.functional.fused_rms_norm_ext - PaddlePaddle 3.0.0 only provided fused_rms_norm (base version) - Not a compatibility issue - PaddleOCR 3.x is fully compatible with PaddlePaddle 3.x - Issue is missing API, not version mismatch What Still Works (Even with Chart Recognition Disabled): ✅ Chart detection and extraction as images ✅ Table recognition (with nested formulas/images) ✅ Formula recognition ✅ Text recognition (OCR core) What's Disabled: ❌ Deep chart understanding (type, data extraction, axis/legend parsing) ❌ Converting chart content to structured data Created Files: 1. CHART_RECOGNITION.md - Comprehensive guide explaining: - Current limitation status and history - What works vs what's disabled - How to verify if newer PaddlePaddle versions support the API - How to enable chart recognition if API becomes available - Troubleshooting and performance considerations 2. backend/verify_chart_recognition.py - Verification script to: - Check if fused_rms_norm_ext API is available - Display current PaddlePaddle version - Provide actionable recommendations Next Steps for Users: 1. Run: conda activate tool_ocr && python backend/verify_chart_recognition.py 2. If API is available, enable chart recognition in ocr_service.py:217 3. Update OpenSpec if limitation is resolved in newer versions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
61
backend/verify_chart_recognition.py
Executable file
61
backend/verify_chart_recognition.py
Executable file
@@ -0,0 +1,61 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Verify if chart recognition can be enabled in the current PaddlePaddle version
|
||||
Run this in the conda environment: conda activate tool_ocr && python verify_chart_recognition.py
|
||||
"""
|
||||
|
||||
import sys
|
||||
|
||||
def check_paddle_api():
|
||||
"""Check if fused_rms_norm_ext API is available"""
|
||||
try:
|
||||
import paddle
|
||||
print(f"✅ PaddlePaddle version: {paddle.__version__}")
|
||||
|
||||
# Check if the API exists
|
||||
import paddle.incubate.nn.functional as F
|
||||
|
||||
has_base = hasattr(F, 'fused_rms_norm')
|
||||
has_ext = hasattr(F, 'fused_rms_norm_ext')
|
||||
|
||||
print(f"\n📊 API Availability:")
|
||||
print(f" - fused_rms_norm: {'✅ Available' if has_base else '❌ Not found'}")
|
||||
print(f" - fused_rms_norm_ext: {'✅ Available' if has_ext else '❌ Not found'}")
|
||||
|
||||
if has_ext:
|
||||
print(f"\n🎉 Chart recognition CAN be enabled!")
|
||||
print(f"\n📝 Action required:")
|
||||
print(f" 1. Edit backend/app/services/ocr_service.py")
|
||||
print(f" 2. Change line 217: use_chart_recognition=False → True")
|
||||
print(f" 3. Restart the backend service")
|
||||
print(f"\n⚠️ Note: This will enable deep chart analysis (may increase processing time)")
|
||||
return True
|
||||
else:
|
||||
print(f"\n❌ Chart recognition CANNOT be enabled yet")
|
||||
print(f"\n📝 Current PaddlePaddle version ({paddle.__version__}) does not support fused_rms_norm_ext")
|
||||
print(f"\n💡 Options:")
|
||||
print(f" 1. Upgrade PaddlePaddle: pip install --upgrade paddlepaddle>=3.2.0")
|
||||
print(f" 2. Check for newer versions: pip search paddlepaddle")
|
||||
print(f" 3. Wait for official PaddlePaddle update")
|
||||
return False
|
||||
|
||||
except ImportError as e:
|
||||
print(f"❌ PaddlePaddle not installed: {e}")
|
||||
print(f"\n💡 Install PaddlePaddle:")
|
||||
print(f" pip install paddlepaddle>=3.2.0")
|
||||
return False
|
||||
except Exception as e:
|
||||
print(f"❌ Error: {e}")
|
||||
return False
|
||||
|
||||
if __name__ == "__main__":
|
||||
print("=" * 70)
|
||||
print("Chart Recognition Availability Checker")
|
||||
print("=" * 70)
|
||||
print()
|
||||
|
||||
can_enable = check_paddle_api()
|
||||
|
||||
print()
|
||||
print("=" * 70)
|
||||
sys.exit(0 if can_enable else 1)
|
||||
Reference in New Issue
Block a user