docs: update documentation for chart recognition enablement
Updates all project documentation to reflect that chart recognition is now fully enabled with PaddlePaddle 3.2.1+. Changes: - README.md: Remove Known Limitations section about chart recognition, update tech stack and prerequisites to include PaddlePaddle 3.2.1+, add WSL CUDA configuration notes - openspec/project.md: Add comprehensive chart recognition feature descriptions, update system requirements for GPU/CUDA support - openspec/changes/add-gpu-acceleration-support/tasks.md: Mark task 5.4 as completed with resolution details - openspec/changes/add-gpu-acceleration-support/proposal.md: Update Known Issues section to show chart recognition is now resolved - setup_dev_env.sh: Upgrade PaddlePaddle from 3.0.0 to 3.2.1+, add WSL CUDA library path configuration, add chart recognition API verification All documentation now accurately reflects: ✅ Chart recognition fully enabled ✅ PaddlePaddle 3.2.1+ with fused_rms_norm_ext API ✅ WSL CUDA path auto-configuration ✅ Comprehensive PP-StructureV3 capabilities 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
43
README.md
43
README.md
@@ -20,7 +20,8 @@ A web-based solution to extract text, images, and document structure from multip
|
||||
|
||||
### Backend
|
||||
- **Framework**: FastAPI 0.115.0
|
||||
- **OCR Engine**: PaddleOCR 3.0+ with PaddleOCR-VL
|
||||
- **OCR Engine**: PaddleOCR 3.0+ with PaddleOCR-VL and PP-StructureV3
|
||||
- **Deep Learning**: PaddlePaddle 3.2.1+ (GPU/CPU support)
|
||||
- **Database**: MySQL via SQLAlchemy
|
||||
- **PDF Generation**: Pandoc + WeasyPrint
|
||||
- **Image Processing**: OpenCV, Pillow, pdf2image
|
||||
@@ -39,7 +40,9 @@ A web-based solution to extract text, images, and document structure from multip
|
||||
- **Python**: 3.12+
|
||||
- **Node.js**: 24.x LTS
|
||||
- **MySQL**: External database server (provided)
|
||||
- **GPU** (Optional): NVIDIA GPU with CUDA 11.2+ for hardware acceleration
|
||||
- **GPU** (Optional): NVIDIA GPU with CUDA 11.8+ for hardware acceleration
|
||||
- PaddlePaddle 3.2.1+ requires CUDA 11.8, 12.3, or 12.6+
|
||||
- WSL2 users: Ensure NVIDIA CUDA drivers are installed
|
||||
|
||||
## Quick Start
|
||||
|
||||
@@ -55,10 +58,11 @@ This script automatically:
|
||||
- Installs Python development tools (pip, venv, build-essential)
|
||||
- Installs system dependencies (pandoc, LibreOffice, fonts, etc.)
|
||||
- Installs Node.js (via nvm)
|
||||
- Installs PaddlePaddle GPU version (if GPU detected) or CPU version
|
||||
- Installs other Python packages
|
||||
- Installs PaddlePaddle 3.2.1+ GPU version (if GPU detected) or CPU version
|
||||
- Configures WSL CUDA library paths (for WSL2 GPU users)
|
||||
- Installs other Python packages (PaddleOCR, PaddleX, etc.)
|
||||
- Installs frontend dependencies
|
||||
- Verifies GPU functionality (if GPU detected)
|
||||
- Verifies GPU functionality and chart recognition API availability
|
||||
|
||||
### 2. Initialize Database
|
||||
|
||||
@@ -155,26 +159,11 @@ The system automatically detects and utilizes NVIDIA GPU hardware when available
|
||||
- **Graceful fallback**: If GPU is unavailable or fails, system automatically uses CPU mode
|
||||
- **Performance**: GPU acceleration provides 3-10x speedup for OCR processing
|
||||
- **Configuration**: Control GPU usage via `.env.local` environment variables
|
||||
- **WSL2 CUDA Setup**: For WSL2 users, CUDA library paths are automatically configured in `~/.bashrc`
|
||||
|
||||
Check GPU status at: http://localhost:8000/health
|
||||
**Chart Recognition**: Requires PaddlePaddle 3.2.0+ for full PP-StructureV3 chart recognition capabilities (chart type detection, data extraction, axis/legend parsing). The setup script installs PaddlePaddle 3.2.1+ which includes all required APIs.
|
||||
|
||||
### Known Limitations
|
||||
|
||||
**Chart Recognition (PP-StructureV3)**
|
||||
|
||||
Due to API incompatibility between PaddleOCR 3.x and PaddlePaddle 3.0.0 stable, the chart recognition feature is currently disabled:
|
||||
|
||||
- ✅ **Works**: Layout analysis detects and extracts charts/figures as image files
|
||||
- ✅ **Works**: Tables, formulas, and text recognition function normally
|
||||
- ❌ **Disabled**: Deep chart content understanding (chart type, data extraction, axis/legend parsing)
|
||||
- ❌ **Disabled**: Converting chart content to structured data
|
||||
|
||||
**Technical Details**:
|
||||
- The PaddleOCR-VL chart recognition model requires `paddle.incubate.nn.functional.fused_rms_norm_ext` API
|
||||
- PaddlePaddle 3.0.0 stable only provides the base `fused_rms_norm` function
|
||||
- This limitation will be resolved when PaddlePaddle releases an update with the extended API
|
||||
|
||||
**Workaround**: Charts are saved as images and can be viewed manually. For chart data extraction, consider using specialized chart recognition tools separately.
|
||||
Check GPU status and chart recognition availability at: http://localhost:8000/health
|
||||
|
||||
## API Endpoints
|
||||
|
||||
@@ -274,6 +263,8 @@ Internal project use
|
||||
- Token expiration is set to 24 hours by default
|
||||
- Office conversion requires LibreOffice (installed via setup script)
|
||||
- Development environment: WSL2 Ubuntu 24.04 with Python venv
|
||||
- **GPU acceleration**: Automatically detected and enabled if NVIDIA GPU with CUDA 11.2+ is available
|
||||
- **WSL GPU support**: Ensure NVIDIA CUDA drivers are installed in WSL for GPU acceleration
|
||||
- GPU status can be checked via `/health` API endpoint
|
||||
- **GPU acceleration**: Automatically detected and enabled if NVIDIA GPU with CUDA 11.8+ is available
|
||||
- **PaddlePaddle version**: System uses PaddlePaddle 3.2.1+ which includes full chart recognition support
|
||||
- **WSL GPU support**: WSL2 CUDA library paths (`/usr/lib/wsl/lib`) are automatically configured in `~/.bashrc`
|
||||
- **Chart recognition**: Fully enabled with PP-StructureV3 for chart type detection, data extraction, and structure analysis
|
||||
- GPU status and chart recognition availability can be checked via `/health` API endpoint
|
||||
|
||||
Reference in New Issue
Block a user