docs: clarify chart recognition limitation and provide verification tool

Chart Recognition Status Investigation:
- OpenSpec limitation record is ACCURATE but based on old PaddlePaddle 3.0.0 (Mar 2025)
- PaddlePaddle has released multiple updates (3.1.x, 3.2.x, latest: 3.2.2 Nov 2025)
- The fused_rms_norm_ext API MAY now be available in newer versions

Root Cause:
- PaddleOCR-VL chart recognition requires paddle.incubate.nn.functional.fused_rms_norm_ext
- PaddlePaddle 3.0.0 only provided fused_rms_norm (base version)
- Not a compatibility issue - PaddleOCR 3.x is fully compatible with PaddlePaddle 3.x
- Issue is missing API, not version mismatch

What Still Works (Even with Chart Recognition Disabled):
 Chart detection and extraction as images
 Table recognition (with nested formulas/images)
 Formula recognition
 Text recognition (OCR core)

What's Disabled:
 Deep chart understanding (type, data extraction, axis/legend parsing)
 Converting chart content to structured data

Created Files:
1. CHART_RECOGNITION.md - Comprehensive guide explaining:
   - Current limitation status and history
   - What works vs what's disabled
   - How to verify if newer PaddlePaddle versions support the API
   - How to enable chart recognition if API becomes available
   - Troubleshooting and performance considerations

2. backend/verify_chart_recognition.py - Verification script to:
   - Check if fused_rms_norm_ext API is available
   - Display current PaddlePaddle version
   - Provide actionable recommendations

Next Steps for Users:
1. Run: conda activate tool_ocr && python backend/verify_chart_recognition.py
2. If API is available, enable chart recognition in ocr_service.py:217
3. Update OpenSpec if limitation is resolved in newer versions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
egg
2025-11-16 18:47:39 +08:00
parent 6bb5b7691f
commit eb77322f8a
2 changed files with 295 additions and 0 deletions

View File

@@ -0,0 +1,61 @@
#!/usr/bin/env python3
"""
Verify if chart recognition can be enabled in the current PaddlePaddle version
Run this in the conda environment: conda activate tool_ocr && python verify_chart_recognition.py
"""
import sys
def check_paddle_api():
"""Check if fused_rms_norm_ext API is available"""
try:
import paddle
print(f"✅ PaddlePaddle version: {paddle.__version__}")
# Check if the API exists
import paddle.incubate.nn.functional as F
has_base = hasattr(F, 'fused_rms_norm')
has_ext = hasattr(F, 'fused_rms_norm_ext')
print(f"\n📊 API Availability:")
print(f" - fused_rms_norm: {'✅ Available' if has_base else '❌ Not found'}")
print(f" - fused_rms_norm_ext: {'✅ Available' if has_ext else '❌ Not found'}")
if has_ext:
print(f"\n🎉 Chart recognition CAN be enabled!")
print(f"\n📝 Action required:")
print(f" 1. Edit backend/app/services/ocr_service.py")
print(f" 2. Change line 217: use_chart_recognition=False → True")
print(f" 3. Restart the backend service")
print(f"\n⚠️ Note: This will enable deep chart analysis (may increase processing time)")
return True
else:
print(f"\n❌ Chart recognition CANNOT be enabled yet")
print(f"\n📝 Current PaddlePaddle version ({paddle.__version__}) does not support fused_rms_norm_ext")
print(f"\n💡 Options:")
print(f" 1. Upgrade PaddlePaddle: pip install --upgrade paddlepaddle>=3.2.0")
print(f" 2. Check for newer versions: pip search paddlepaddle")
print(f" 3. Wait for official PaddlePaddle update")
return False
except ImportError as e:
print(f"❌ PaddlePaddle not installed: {e}")
print(f"\n💡 Install PaddlePaddle:")
print(f" pip install paddlepaddle>=3.2.0")
return False
except Exception as e:
print(f"❌ Error: {e}")
return False
if __name__ == "__main__":
print("=" * 70)
print("Chart Recognition Availability Checker")
print("=" * 70)
print()
can_enable = check_paddle_api()
print()
print("=" * 70)
sys.exit(0 if can_enable else 1)