Updates all project documentation to reflect that chart recognition is now fully enabled with PaddlePaddle 3.2.1+. Changes: - README.md: Remove Known Limitations section about chart recognition, update tech stack and prerequisites to include PaddlePaddle 3.2.1+, add WSL CUDA configuration notes - openspec/project.md: Add comprehensive chart recognition feature descriptions, update system requirements for GPU/CUDA support - openspec/changes/add-gpu-acceleration-support/tasks.md: Mark task 5.4 as completed with resolution details - openspec/changes/add-gpu-acceleration-support/proposal.md: Update Known Issues section to show chart recognition is now resolved - setup_dev_env.sh: Upgrade PaddlePaddle from 3.0.0 to 3.2.1+, add WSL CUDA library path configuration, add chart recognition API verification All documentation now accurately reflects: ✅ Chart recognition fully enabled ✅ PaddlePaddle 3.2.1+ with fused_rms_norm_ext API ✅ WSL CUDA path auto-configuration ✅ Comprehensive PP-StructureV3 capabilities 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
342 lines
14 KiB
Markdown
342 lines
14 KiB
Markdown
# Project Context
|
|
|
|
## Purpose
|
|
Tool_OCR is a web-based application for batch image-to-text conversion with multi-language support and rule-based output formatting. The tool uses a modern frontend-backend separation architecture, designed to process multiple images/PDFs simultaneously, extract text using OCR, and export results in various formats according to user-defined rules.
|
|
|
|
**Key Goals:**
|
|
- Batch processing of images and PDF files for text extraction via web interface
|
|
- Multi-language OCR support (Chinese, English, and other languages)
|
|
- Rule-based output formatting and organization
|
|
- User-friendly web interface accessible via browser
|
|
- Export flexibility (TXT, JSON, Excel, etc.)
|
|
- RESTful API for OCR processing
|
|
|
|
## Tech Stack
|
|
|
|
### Development Environment
|
|
- **OS Platform**: WSL2 Ubuntu 24.04
|
|
- **Python Version**: 3.12
|
|
- **Environment Manager**: Python venv
|
|
- **Virtual Environment Path**: `./venv`
|
|
- **Node.js**: 24.x LTS (via nvm)
|
|
- **IDE Recommended**: VS Code with Python + React extensions
|
|
|
|
### Backend Technologies
|
|
- **Language**: Python 3.10+
|
|
- **Web Framework**: FastAPI (modern, async, auto API docs)
|
|
- **OCR Engine**: PaddleOCR 3.0+ with PaddleOCR-VL (deep learning-based, excellent multi-language support)
|
|
- **Deep Learning Framework**: PaddlePaddle 3.2.1+ (GPU/CPU support, CUDA 11.8/12.3/12.6+)
|
|
- **Structure Analysis**: PP-StructureV3 (layout analysis, table recognition, formula extraction, chart recognition)
|
|
- **PDF Processing**: PyPDF2 / pdf2image
|
|
- **Image Processing**: Pillow (PIL), OpenCV
|
|
- **Data Export**: pandas (Excel), json (JSON)
|
|
- **Database**: MySQL (configuration storage, task history)
|
|
- **Cache**: Redis (optional, for task queue)
|
|
- **Authentication**: JWT
|
|
|
|
### Frontend Technologies
|
|
- **Framework**: React 18+
|
|
- **Build Tool**: Vite
|
|
- **UI Library**: Tailwind CSS + shadcn/ui
|
|
- **State Management**: React Query (for API calls) + Zustand (for global state)
|
|
- **HTTP Client**: Axios
|
|
- **File Upload**: react-dropzone
|
|
|
|
### Development Tools
|
|
- **Package Manager**: Conda + pip (backend), npm/pnpm (frontend)
|
|
- **Deployment**: 1Panel (web-based server management)
|
|
- **Process Manager**: systemd / PM2 / Supervisor
|
|
- **Web Server**: Nginx (reverse proxy)
|
|
- **Testing**: pytest (backend), Vitest (frontend)
|
|
- **Code Style**: Black + pylint (Python), ESLint + Prettier (JavaScript/TypeScript)
|
|
- **Version Control**: Git
|
|
|
|
### Key Libraries (Backend)
|
|
- fastapi: Web framework
|
|
- uvicorn: ASGI server
|
|
- paddleocr: OCR processing
|
|
- paddlepaddle: Deep learning framework (GPU/CPU)
|
|
- paddlex[ocr]: PP-StructureV3 for layout analysis and chart recognition
|
|
- pdf2image: PDF to image conversion
|
|
- pillow: Image manipulation
|
|
- opencv-python: Advanced image processing
|
|
- pandas: Data export to Excel
|
|
- pyyaml: Configuration management
|
|
- python-jose: JWT authentication
|
|
- sqlalchemy: Database ORM
|
|
- pydantic: Data validation
|
|
|
|
### Key Libraries (Frontend)
|
|
- react: UI framework
|
|
- vite: Build tool
|
|
- tailwindcss: CSS framework
|
|
- shadcn/ui: UI components
|
|
- axios: HTTP client
|
|
- react-query: Server state management
|
|
- zustand: Client state management
|
|
- react-dropzone: File upload
|
|
|
|
## Project Conventions
|
|
|
|
### Environment Setup (Backend)
|
|
```bash
|
|
# Run automated setup script (recommended)
|
|
./setup_dev_env.sh
|
|
|
|
# Or manually:
|
|
# Create Python virtual environment
|
|
python3 -m venv venv
|
|
|
|
# Activate environment
|
|
source venv/bin/activate
|
|
|
|
# Install dependencies
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
### Environment Setup (Frontend)
|
|
```bash
|
|
# Navigate to frontend directory
|
|
cd frontend
|
|
|
|
# Install dependencies
|
|
npm install
|
|
|
|
# Run dev server
|
|
npm run dev
|
|
```
|
|
|
|
### Code Style
|
|
|
|
#### Backend (Python)
|
|
- **Formatter**: Black with line length 100
|
|
- **Naming Conventions**:
|
|
- Classes: PascalCase (e.g., `OcrProcessor`, `ImageService`)
|
|
- Functions/Methods: snake_case (e.g., `process_image`, `export_results`)
|
|
- Constants: UPPER_SNAKE_CASE (e.g., `MAX_BATCH_SIZE`, `DEFAULT_LANG`)
|
|
- Private members: prefix with underscore (e.g., `_internal_method`)
|
|
- **Docstrings**: Google style for all public functions and classes
|
|
- **Type Hints**: Use type hints for function signatures (FastAPI requirement)
|
|
- **Imports**: Organized by standard library, third-party, local (separated by blank lines)
|
|
- **Encoding**: UTF-8 for all Python files
|
|
|
|
#### Frontend (JavaScript/TypeScript)
|
|
- **Formatter**: Prettier
|
|
- **Naming Conventions**:
|
|
- Components: PascalCase (e.g., `ImageUpload`, `ResultsTable`)
|
|
- Functions/Variables: camelCase (e.g., `processImage`, `ocrResults`)
|
|
- Constants: UPPER_SNAKE_CASE (e.g., `MAX_FILE_SIZE`, `API_BASE_URL`)
|
|
- CSS Classes: kebab-case (Tailwind convention)
|
|
- **File Structure**: One component per file
|
|
- **Imports**: Group by external, internal, types
|
|
|
|
### Architecture Patterns
|
|
|
|
#### Backend Architecture
|
|
- **Layered Architecture**:
|
|
- Router Layer (FastAPI routes)
|
|
- Service Layer (business logic)
|
|
- Data Access Layer (database/file operations)
|
|
- Model Layer (Pydantic models)
|
|
- **Async/Await**: Use async operations for I/O bound tasks
|
|
- **Dependency Injection**: FastAPI's dependency injection for services
|
|
- **Error Handling**: Custom exception handlers with proper HTTP status codes
|
|
- **Logging**: Structured logging with log levels
|
|
- **Background Tasks**: FastAPI BackgroundTasks for long-running OCR jobs
|
|
|
|
#### Frontend Architecture
|
|
- **Component-Based**: Reusable React components
|
|
- **Atomic Design**: atoms → molecules → organisms → templates → pages
|
|
- **API Layer**: Centralized API client with React Query
|
|
- **State Management**: Server state (React Query) + Client state (Zustand)
|
|
- **Routing**: React Router for SPA navigation
|
|
- **Error Boundaries**: Graceful error handling in UI
|
|
|
|
#### API Design
|
|
- **RESTful**: Follow REST conventions
|
|
- **Versioning**: API versioned as `/api/v1/...`
|
|
- **Documentation**: Auto-generated via FastAPI (Swagger/OpenAPI)
|
|
- **Response Format**: Consistent JSON structure
|
|
```json
|
|
{
|
|
"success": true,
|
|
"data": {},
|
|
"message": "Success",
|
|
"timestamp": "2025-01-01T00:00:00Z"
|
|
}
|
|
```
|
|
|
|
### Testing Strategy
|
|
|
|
#### Backend Testing
|
|
- **Unit Tests**: Test services, utilities, data models
|
|
- **Integration Tests**: Test API endpoints end-to-end
|
|
- **Test Framework**: pytest with pytest-asyncio
|
|
- **Coverage Target**: Minimum 70% code coverage
|
|
- **Test Command**: `pytest tests/ -v --cov=app`
|
|
|
|
#### Frontend Testing
|
|
- **Component Tests**: Test React components with Vitest + React Testing Library
|
|
- **Integration Tests**: Test user workflows
|
|
- **E2E Tests**: Optional with Playwright
|
|
- **Test Command**: `npm run test`
|
|
|
|
### Git Workflow
|
|
- **Branching**: Feature branches from main (e.g., `feature/add-pdf-support`)
|
|
- **Commits**: Conventional Commits format (e.g., `feat:`, `fix:`, `docs:`)
|
|
- **PRs**: Require passing tests before merge
|
|
- **Versioning**: Semantic versioning (MAJOR.MINOR.PATCH)
|
|
|
|
## Domain Context
|
|
|
|
### OCR Concepts
|
|
- **Recognition Accuracy**: Depends on image quality, language, and font type
|
|
- **Preprocessing**: Image enhancement (contrast, denoising) can improve OCR accuracy
|
|
- **Multi-Language**: PaddleOCR supports Chinese, English, Japanese, Korean, and many others
|
|
- **Bounding Boxes**: OCR engines detect text regions before recognition
|
|
- **Confidence Scores**: Each recognized text has a confidence score (0-1)
|
|
|
|
### Document Structure Analysis (PP-StructureV3)
|
|
- **Layout Analysis**: Automatic detection of document regions (text, images, tables, charts, formulas)
|
|
- **Table Recognition**: Extract table structure and content with support for nested formulas and images
|
|
- **Formula Recognition**: Convert mathematical formulas to LaTeX format
|
|
- **Chart Recognition** (✅ Enabled with PaddlePaddle 3.2.1+):
|
|
- **Chart Type Detection**: Identify bar charts, line charts, pie charts, scatter plots, etc.
|
|
- **Data Extraction**: Extract numerical data points from chart visualizations
|
|
- **Axis & Legend Parsing**: Recognize axis labels, tick values, and legend information
|
|
- **Structured Output**: Convert chart content to JSON or tabular format
|
|
- **Performance**: GPU acceleration recommended for best results (2-10 seconds per chart)
|
|
- **Accuracy**: >85% for simple charts, >70% for complex multi-axis charts
|
|
- **Image Extraction**: Preserve and save embedded images from documents
|
|
|
|
### Use Cases
|
|
- Digitizing scanned documents and images via web upload
|
|
- Extracting text from screenshots for archival
|
|
- Processing receipts and invoices for data entry
|
|
- Converting image-based PDFs to searchable text
|
|
- Batch processing multiple files via drag-and-drop interface
|
|
|
|
### Output Rules
|
|
- Users can define custom rules for organizing extracted text
|
|
- Examples: group by file name pattern, filter by confidence threshold, format as structured data
|
|
- Export formats: plain text files, JSON with metadata, Excel spreadsheets
|
|
|
|
## Important Constraints
|
|
|
|
### Technical Constraints
|
|
- **Platform**: Windows 10/11 (development), Docker-based deployment
|
|
- **Web Application**: Browser-based interface (Chrome, Firefox, Edge)
|
|
- **Local Processing**: All OCR processing happens on backend server (no cloud dependencies)
|
|
- **Resource Intensive**: OCR is CPU/GPU intensive; consider task queue for batch processing
|
|
- **File Size Limits**: Set max upload size (e.g., 20MB per file, 100MB per batch)
|
|
- **Language Models**: PaddleOCR models must be downloaded (~100MB+ per language)
|
|
- **Conda Environment**: Backend development must be done within Conda virtual environment
|
|
- **Port Range**: Web services must use ports 12010-12019
|
|
|
|
### User Experience Constraints
|
|
- **Target Users**: Non-technical users who need simple batch OCR via web
|
|
- **Browser Compatibility**: Modern browsers (Chrome 90+, Firefox 88+, Edge 90+)
|
|
- **Performance**: UI must show progress feedback during OCR processing
|
|
- **Error Messages**: Clear, actionable error messages in Traditional Chinese
|
|
- **Responsive Design**: UI should work on desktop and tablet (mobile optional)
|
|
|
|
### Business Constraints
|
|
- **Open Source**: Use only open-source libraries (no paid API dependencies)
|
|
- **Deployment**: 1Panel-based deployment (no Docker required)
|
|
- **Offline Capable**: Must work without internet after initial setup (except model downloads)
|
|
- **Authentication**: JWT-based auth (optional LDAP integration for enterprise)
|
|
|
|
### Security Constraints
|
|
- **File Upload**: Validate file types, scan for malware (optional)
|
|
- **Authentication**: JWT tokens with expiration
|
|
- **CORS**: Configure CORS for frontend-backend communication
|
|
- **Input Validation**: Strict validation on all API inputs
|
|
|
|
## External Dependencies
|
|
|
|
### Database Configuration
|
|
- **MySQL Host**: mysql.theaken.com
|
|
- **MySQL Port**: 33306
|
|
- **MySQL User**: A060
|
|
- **MySQL Password**: WLeSCi0yhtc7
|
|
- **MySQL Database**: db_A060
|
|
- **MySQL Charset**: utf8mb4
|
|
|
|
### SMTP Configuration (Optional)
|
|
- **SMTP Server**: mail.panjit.com.tw
|
|
- **SMTP Port**: 25
|
|
- **SMTP TLS**: false
|
|
- **SMTP Auth**: false
|
|
- **Sender Email**: tool-ocr-system@panjit.com.tw
|
|
|
|
### LDAP Configuration (Optional)
|
|
- **LDAP Server**: panjit.com.tw
|
|
- **LDAP Port**: 389
|
|
|
|
### Conda Environment
|
|
- **Environment Name**: `tool_ocr`
|
|
- **Python Version**: 3.10
|
|
- **Base Path**: `C:\Users\lin46\.conda\envs\tool_ocr`
|
|
- **Activation**: Always activate environment before backend development
|
|
|
|
### OCR Models
|
|
- **PaddleOCR Models**: Downloaded automatically on first run or manually installed
|
|
- **Model Storage**: Local cache directory or Docker volume
|
|
- **Supported Languages**: Chinese (simplified/traditional), English, Japanese, Korean, etc.
|
|
- **Model Size**: ~100-200MB per language pack
|
|
|
|
### System Requirements
|
|
- **Python**: 3.10+ (managed by Conda or venv)
|
|
- **Node.js**: 18+ (for frontend development and build)
|
|
- **RAM**: Minimum 4GB (8GB recommended for batch processing, 16GB+ for GPU usage)
|
|
- **Disk Space**: ~2GB for application + models + dependencies
|
|
- **OS**: Windows 10/11 (development), WSL2 Ubuntu 24.04 (development), Linux (1Panel deployment server)
|
|
- **GPU** (Optional but recommended):
|
|
- NVIDIA GPU with CUDA 11.8, 12.3, or 12.6+ support
|
|
- GPU Memory: Minimum 4GB (8GB+ recommended for chart recognition)
|
|
- WSL2 GPU: NVIDIA CUDA drivers installed for WSL
|
|
- Performance: 3-10x speedup for OCR and chart recognition
|
|
- **Web Server**: Nginx (for static files and reverse proxy)
|
|
- **Process Manager**: Supervisor / PM2 / systemd (for backend service)
|
|
|
|
### Port Configuration
|
|
- **Backend API**: 12010 (FastAPI via uvicorn)
|
|
- **Frontend Dev Server**: 12011 (Vite, development only)
|
|
- **Nginx**: 80/443 (production, managed by 1Panel)
|
|
- **MySQL**: 33306 (external)
|
|
- **Redis**: 6379 (optional, local)
|
|
|
|
### Deployment Architecture (1Panel)
|
|
- **Development**: Windows with Conda + local Node.js
|
|
- **Production**: Linux server managed by 1Panel
|
|
- **Backend Deployment**:
|
|
- Conda environment on production server
|
|
- uvicorn runs FastAPI on port 12010
|
|
- Managed by Supervisor/PM2/systemd for auto-restart
|
|
- **Frontend Deployment**:
|
|
- Build static files with `npm run build`
|
|
- Served by Nginx (configured via 1Panel)
|
|
- Nginx reverse proxies `/api` to backend (12010)
|
|
- **1Panel Features**:
|
|
- Website management (Nginx configuration)
|
|
- Process management (backend service)
|
|
- SSL certificate management (Let's Encrypt)
|
|
- File management and deployment
|
|
|
|
### Configuration Files
|
|
- **Backend**:
|
|
- `environment.yml`: Conda environment specification
|
|
- `requirements.txt`: Pip dependencies
|
|
- `.env`: Environment variables (database, JWT secret, etc.)
|
|
- `config.yaml`: Application configuration
|
|
- `start.sh`: Backend startup script
|
|
- **Frontend**:
|
|
- `package.json`: npm dependencies
|
|
- `.env.production`: Production environment variables (API URL)
|
|
- `vite.config.js`: Vite configuration
|
|
- `build.sh`: Frontend build script
|
|
- **Deployment**:
|
|
- `nginx.conf`: Nginx reverse proxy configuration
|
|
- `supervisor.conf` or `pm2.config.js`: Process manager configuration
|
|
- `deploy.sh`: Deployment automation script
|