# Tool_OCR **OCR Batch Processing System with Structure Extraction** A web-based solution to extract text, images, and document structure from multiple files efficiently using PaddleOCR-VL. ## Features - 🔍 **Multi-Language OCR**: Support for 109 languages (Chinese, English, Japanese, Korean, etc.) - 📄 **Document Structure Analysis**: Intelligent layout analysis with PP-StructureV3 - 🖼️ **Image Extraction**: Preserve document images alongside text content - 📑 **Batch Processing**: Process multiple files concurrently with progress tracking - 📤 **Multiple Export Formats**: TXT, JSON, Excel, Markdown with images, searchable PDF - 🔧 **Flexible Configuration**: Rule-based output formatting - 🌐 **Translation Ready**: Reserved architecture for future translation features ## Tech Stack ### Backend - **Framework**: FastAPI 0.115.0 - **OCR Engine**: PaddleOCR 3.0+ with PaddleOCR-VL - **Database**: MySQL via SQLAlchemy - **PDF Generation**: Pandoc + WeasyPrint - **Image Processing**: OpenCV, Pillow, pdf2image ### Frontend - **Framework**: React 18 with Vite - **Styling**: TailwindCSS + shadcn/ui - **HTTP Client**: Axios with React Query ## Prerequisites - **macOS**: Apple Silicon (M1/M2/M3) or Intel - **Python**: 3.10+ - **Conda**: Miniconda or Anaconda (will be installed automatically) - **Homebrew**: For system dependencies - **MySQL**: External database server (provided) ## Installation ### 1. Automated Setup (Recommended) ```bash # Clone the repository cd /Users/egg/Projects/Tool_OCR # Run automated setup script chmod +x setup_conda.sh ./setup_conda.sh # If Conda was just installed, reload your shell source ~/.zshrc # or source ~/.bash_profile # Run the script again to create environment ./setup_conda.sh ``` ### 2. Install Dependencies ```bash # Activate Conda environment conda activate tool_ocr # Install Python dependencies pip install -r requirements.txt # Install system dependencies (Pandoc for PDF generation) brew install pandoc # Install Chinese fonts for PDF generation (optional) brew install --cask font-noto-sans-cjk # Note: macOS built-in fonts work fine, this is optional ``` ### 3. Download PaddleOCR Models ```bash # Create models directory mkdir -p models/paddleocr # Models will be automatically downloaded on first run # (~900MB total, includes PaddleOCR-VL 0.9B model) ``` ### 4. Configure Environment ```bash # Copy environment template cp .env.example .env # Edit .env with your settings # Database credentials are pre-configured nano .env ``` ### 5. Initialize Database ```bash # Database schema will be created automatically on first run # Using: mysql.theaken.com:33306/db_A060 ``` ## Usage ### Start Backend Server ```bash # Activate environment conda activate tool_ocr # Start FastAPI server cd backend python -m app.main # Server runs at: http://localhost:12010 # API docs: http://localhost:12010/docs ``` ### Start Frontend (Coming Soon) ```bash # Install frontend dependencies cd frontend npm install # Start development server npm run dev # Frontend runs at: http://localhost:12011 ``` ## Project Structure ``` Tool_OCR/ ├── backend/ │ ├── app/ │ │ ├── api/v1/ # API endpoints │ │ ├── core/ # Configuration, database │ │ ├── models/ # Database models │ │ ├── services/ # Business logic │ │ ├── utils/ # Utilities │ │ └── main.py # Application entry point │ └── tests/ # Test suite ├── frontend/ │ └── src/ # React application ├── uploads/ │ ├── temp/ # Temporary uploads │ ├── processed/ # Processed files │ └── images/ # Extracted images ├── storage/ │ ├── markdown/ # Markdown outputs │ ├── json/ # JSON results │ └── exports/ # Export files ├── models/ │ └── paddleocr/ # PaddleOCR models ├── config/ # Configuration files ├── templates/ # PDF templates ├── logs/ # Application logs ├── requirements.txt # Python dependencies ├── setup_conda.sh # Environment setup script ├── .env.example # Environment template └── README.md ``` ## API Endpoints (Planned) - `POST /api/v1/ocr/upload` - Upload files for OCR processing - `GET /api/v1/ocr/tasks` - List all OCR tasks - `GET /api/v1/ocr/tasks/{task_id}` - Get task details - `POST /api/v1/ocr/batch` - Create batch processing task - `GET /api/v1/export/{task_id}` - Export results (TXT/JSON/Excel/MD/PDF) - `POST /api/v1/translate/document` - Translate document (reserved, returns 501) ## Development ### Run Tests ```bash cd backend pytest tests/ -v --cov=app ``` ### Code Quality ```bash # Format code black app/ # Lint code pylint app/ ``` ## OpenSpec Workflow This project follows OpenSpec for specification-driven development: ```bash # View current changes openspec list # Validate specifications openspec validate add-ocr-batch-processing # View implementation tasks cat openspec/changes/add-ocr-batch-processing/tasks.md ``` ## Roadmap - [x] **Phase 0**: Environment setup and configuration - [ ] **Phase 1**: Core OCR with structure extraction - [ ] **Phase 2**: Frontend development - [ ] **Phase 3**: Testing & optimization - [ ] **Phase 4**: Deployment - [ ] **Phase 5**: Translation feature (future) ## License [To be determined] ## Contributors - Development environment: macOS Apple Silicon - Database: MySQL external server - OCR Engine: PaddleOCR-VL 0.9B with PP-StructureV3 ## Support For issues and questions, refer to: - OpenSpec documentation: `openspec/AGENTS.md` - Task breakdown: `openspec/changes/add-ocr-batch-processing/tasks.md` - Specifications: `openspec/changes/add-ocr-batch-processing/specs/`