# Project Context ## Purpose Tool_OCR is a web-based application for batch image-to-text conversion with multi-language support and rule-based output formatting. The tool uses a modern frontend-backend separation architecture, designed to process multiple images/PDFs simultaneously, extract text using OCR, and export results in various formats according to user-defined rules. **Key Goals:** - Batch processing of images and PDF files for text extraction via web interface - Multi-language OCR support (Chinese, English, and other languages) - Rule-based output formatting and organization - User-friendly web interface accessible via browser - Export flexibility (TXT, JSON, Excel, etc.) - RESTful API for OCR processing ## Tech Stack ### Development Environment - **OS Platform**: WSL2 Ubuntu 24.04 - **Python Version**: 3.12 - **Environment Manager**: Python venv - **Virtual Environment Path**: `./venv` - **Node.js**: 24.x LTS (via nvm) - **IDE Recommended**: VS Code with Python + React extensions ### Backend Technologies - **Language**: Python 3.10+ - **Web Framework**: FastAPI (modern, async, auto API docs) - **OCR Engine**: PaddleOCR (deep learning-based, excellent multi-language support) - **PDF Processing**: PyPDF2 / pdf2image - **Image Processing**: Pillow (PIL) - **Data Export**: pandas (Excel), json (JSON) - **Database**: MySQL (configuration storage, task history) - **Cache**: Redis (optional, for task queue) - **Authentication**: JWT ### Frontend Technologies - **Framework**: React 18+ - **Build Tool**: Vite - **UI Library**: Tailwind CSS + shadcn/ui - **State Management**: React Query (for API calls) + Zustand (for global state) - **HTTP Client**: Axios - **File Upload**: react-dropzone ### Development Tools - **Package Manager**: Conda + pip (backend), npm/pnpm (frontend) - **Deployment**: 1Panel (web-based server management) - **Process Manager**: systemd / PM2 / Supervisor - **Web Server**: Nginx (reverse proxy) - **Testing**: pytest (backend), Vitest (frontend) - **Code Style**: Black + pylint (Python), ESLint + Prettier (JavaScript/TypeScript) - **Version Control**: Git ### Key Libraries (Backend) - fastapi: Web framework - uvicorn: ASGI server - paddleocr: OCR processing - pdf2image: PDF to image conversion - pillow: Image manipulation - pandas: Data export to Excel - pyyaml: Configuration management - python-jose: JWT authentication - sqlalchemy: Database ORM - pydantic: Data validation ### Key Libraries (Frontend) - react: UI framework - vite: Build tool - tailwindcss: CSS framework - shadcn/ui: UI components - axios: HTTP client - react-query: Server state management - zustand: Client state management - react-dropzone: File upload ## Project Conventions ### Environment Setup (Backend) ```bash # Run automated setup script (recommended) ./setup_dev_env.sh # Or manually: # Create Python virtual environment python3 -m venv venv # Activate environment source venv/bin/activate # Install dependencies pip install -r requirements.txt ``` ### Environment Setup (Frontend) ```bash # Navigate to frontend directory cd frontend # Install dependencies npm install # Run dev server npm run dev ``` ### Code Style #### Backend (Python) - **Formatter**: Black with line length 100 - **Naming Conventions**: - Classes: PascalCase (e.g., `OcrProcessor`, `ImageService`) - Functions/Methods: snake_case (e.g., `process_image`, `export_results`) - Constants: UPPER_SNAKE_CASE (e.g., `MAX_BATCH_SIZE`, `DEFAULT_LANG`) - Private members: prefix with underscore (e.g., `_internal_method`) - **Docstrings**: Google style for all public functions and classes - **Type Hints**: Use type hints for function signatures (FastAPI requirement) - **Imports**: Organized by standard library, third-party, local (separated by blank lines) - **Encoding**: UTF-8 for all Python files #### Frontend (JavaScript/TypeScript) - **Formatter**: Prettier - **Naming Conventions**: - Components: PascalCase (e.g., `ImageUpload`, `ResultsTable`) - Functions/Variables: camelCase (e.g., `processImage`, `ocrResults`) - Constants: UPPER_SNAKE_CASE (e.g., `MAX_FILE_SIZE`, `API_BASE_URL`) - CSS Classes: kebab-case (Tailwind convention) - **File Structure**: One component per file - **Imports**: Group by external, internal, types ### Architecture Patterns #### Backend Architecture - **Layered Architecture**: - Router Layer (FastAPI routes) - Service Layer (business logic) - Data Access Layer (database/file operations) - Model Layer (Pydantic models) - **Async/Await**: Use async operations for I/O bound tasks - **Dependency Injection**: FastAPI's dependency injection for services - **Error Handling**: Custom exception handlers with proper HTTP status codes - **Logging**: Structured logging with log levels - **Background Tasks**: FastAPI BackgroundTasks for long-running OCR jobs #### Frontend Architecture - **Component-Based**: Reusable React components - **Atomic Design**: atoms → molecules → organisms → templates → pages - **API Layer**: Centralized API client with React Query - **State Management**: Server state (React Query) + Client state (Zustand) - **Routing**: React Router for SPA navigation - **Error Boundaries**: Graceful error handling in UI #### API Design - **RESTful**: Follow REST conventions - **Versioning**: API versioned as `/api/v1/...` - **Documentation**: Auto-generated via FastAPI (Swagger/OpenAPI) - **Response Format**: Consistent JSON structure ```json { "success": true, "data": {}, "message": "Success", "timestamp": "2025-01-01T00:00:00Z" } ``` ### Testing Strategy #### Backend Testing - **Unit Tests**: Test services, utilities, data models - **Integration Tests**: Test API endpoints end-to-end - **Test Framework**: pytest with pytest-asyncio - **Coverage Target**: Minimum 70% code coverage - **Test Command**: `pytest tests/ -v --cov=app` #### Frontend Testing - **Component Tests**: Test React components with Vitest + React Testing Library - **Integration Tests**: Test user workflows - **E2E Tests**: Optional with Playwright - **Test Command**: `npm run test` ### Git Workflow - **Branching**: Feature branches from main (e.g., `feature/add-pdf-support`) - **Commits**: Conventional Commits format (e.g., `feat:`, `fix:`, `docs:`) - **PRs**: Require passing tests before merge - **Versioning**: Semantic versioning (MAJOR.MINOR.PATCH) ## Domain Context ### OCR Concepts - **Recognition Accuracy**: Depends on image quality, language, and font type - **Preprocessing**: Image enhancement (contrast, denoising) can improve OCR accuracy - **Multi-Language**: PaddleOCR supports Chinese, English, Japanese, Korean, and many others - **Bounding Boxes**: OCR engines detect text regions before recognition - **Confidence Scores**: Each recognized text has a confidence score (0-1) ### Use Cases - Digitizing scanned documents and images via web upload - Extracting text from screenshots for archival - Processing receipts and invoices for data entry - Converting image-based PDFs to searchable text - Batch processing multiple files via drag-and-drop interface ### Output Rules - Users can define custom rules for organizing extracted text - Examples: group by file name pattern, filter by confidence threshold, format as structured data - Export formats: plain text files, JSON with metadata, Excel spreadsheets ## Important Constraints ### Technical Constraints - **Platform**: Windows 10/11 (development), Docker-based deployment - **Web Application**: Browser-based interface (Chrome, Firefox, Edge) - **Local Processing**: All OCR processing happens on backend server (no cloud dependencies) - **Resource Intensive**: OCR is CPU/GPU intensive; consider task queue for batch processing - **File Size Limits**: Set max upload size (e.g., 20MB per file, 100MB per batch) - **Language Models**: PaddleOCR models must be downloaded (~100MB+ per language) - **Conda Environment**: Backend development must be done within Conda virtual environment - **Port Range**: Web services must use ports 12010-12019 ### User Experience Constraints - **Target Users**: Non-technical users who need simple batch OCR via web - **Browser Compatibility**: Modern browsers (Chrome 90+, Firefox 88+, Edge 90+) - **Performance**: UI must show progress feedback during OCR processing - **Error Messages**: Clear, actionable error messages in Traditional Chinese - **Responsive Design**: UI should work on desktop and tablet (mobile optional) ### Business Constraints - **Open Source**: Use only open-source libraries (no paid API dependencies) - **Deployment**: 1Panel-based deployment (no Docker required) - **Offline Capable**: Must work without internet after initial setup (except model downloads) - **Authentication**: JWT-based auth (optional LDAP integration for enterprise) ### Security Constraints - **File Upload**: Validate file types, scan for malware (optional) - **Authentication**: JWT tokens with expiration - **CORS**: Configure CORS for frontend-backend communication - **Input Validation**: Strict validation on all API inputs ## External Dependencies ### Database Configuration - **MySQL Host**: mysql.theaken.com - **MySQL Port**: 33306 - **MySQL User**: A060 - **MySQL Password**: WLeSCi0yhtc7 - **MySQL Database**: db_A060 - **MySQL Charset**: utf8mb4 ### SMTP Configuration (Optional) - **SMTP Server**: mail.panjit.com.tw - **SMTP Port**: 25 - **SMTP TLS**: false - **SMTP Auth**: false - **Sender Email**: tool-ocr-system@panjit.com.tw ### LDAP Configuration (Optional) - **LDAP Server**: panjit.com.tw - **LDAP Port**: 389 ### Conda Environment - **Environment Name**: `tool_ocr` - **Python Version**: 3.10 - **Base Path**: `C:\Users\lin46\.conda\envs\tool_ocr` - **Activation**: Always activate environment before backend development ### OCR Models - **PaddleOCR Models**: Downloaded automatically on first run or manually installed - **Model Storage**: Local cache directory or Docker volume - **Supported Languages**: Chinese (simplified/traditional), English, Japanese, Korean, etc. - **Model Size**: ~100-200MB per language pack ### System Requirements - **Python**: 3.10+ (managed by Conda) - **Node.js**: 18+ (for frontend development and build) - **RAM**: Minimum 4GB (8GB recommended for batch processing) - **Disk Space**: ~2GB for application + models + dependencies - **OS**: Windows 10/11 (development), Linux (1Panel deployment server) - **Web Server**: Nginx (for static files and reverse proxy) - **Process Manager**: Supervisor / PM2 / systemd (for backend service) ### Port Configuration - **Backend API**: 12010 (FastAPI via uvicorn) - **Frontend Dev Server**: 12011 (Vite, development only) - **Nginx**: 80/443 (production, managed by 1Panel) - **MySQL**: 33306 (external) - **Redis**: 6379 (optional, local) ### Deployment Architecture (1Panel) - **Development**: Windows with Conda + local Node.js - **Production**: Linux server managed by 1Panel - **Backend Deployment**: - Conda environment on production server - uvicorn runs FastAPI on port 12010 - Managed by Supervisor/PM2/systemd for auto-restart - **Frontend Deployment**: - Build static files with `npm run build` - Served by Nginx (configured via 1Panel) - Nginx reverse proxies `/api` to backend (12010) - **1Panel Features**: - Website management (Nginx configuration) - Process management (backend service) - SSL certificate management (Let's Encrypt) - File management and deployment ### Configuration Files - **Backend**: - `environment.yml`: Conda environment specification - `requirements.txt`: Pip dependencies - `.env`: Environment variables (database, JWT secret, etc.) - `config.yaml`: Application configuration - `start.sh`: Backend startup script - **Frontend**: - `package.json`: npm dependencies - `.env.production`: Production environment variables (API URL) - `vite.config.js`: Vite configuration - `build.sh`: Frontend build script - **Deployment**: - `nginx.conf`: Nginx reverse proxy configuration - `supervisor.conf` or `pm2.config.js`: Process manager configuration - `deploy.sh`: Deployment automation script