OCR/README.md

# Tool_OCR

**OCR Batch Processing System with Structure Extraction**

A web-based solution to extract text, images, and document structure from multiple files efficiently using PaddleOCR-VL.

## Features

- 🔍 **Multi-Language OCR**: Support for 109 languages (Chinese, English, Japanese, Korean, etc.)
- 📄 **Document Structure Analysis**: Intelligent layout analysis with PP-StructureV3
- 🖼️ **Image Extraction**: Preserve document images alongside text content
- 📑 **Batch Processing**: Process multiple files concurrently with progress tracking
- 📤 **Multiple Export Formats**: TXT, JSON, Excel, Markdown with images, searchable PDF
- 🔧 **Flexible Configuration**: Rule-based output formatting
- 🌐 **Translation Ready**: Reserved architecture for future translation features

## Tech Stack

### Backend
- **Framework**: FastAPI 0.115.0
- **OCR Engine**: PaddleOCR 3.0+ with PaddleOCR-VL
- **Database**: MySQL via SQLAlchemy
- **PDF Generation**: Pandoc + WeasyPrint
- **Image Processing**: OpenCV, Pillow, pdf2image

### Frontend
- **Framework**: React 18 with Vite
- **Styling**: TailwindCSS + shadcn/ui
- **HTTP Client**: Axios with React Query

## Prerequisites

- **macOS**: Apple Silicon (M1/M2/M3) or Intel
- **Python**: 3.10+
- **Conda**: Miniconda or Anaconda (will be installed automatically)
- **Homebrew**: For system dependencies
- **MySQL**: External database server (provided)

## Installation

### 1. Automated Setup (Recommended)

```bash
# Clone the repository
cd /Users/egg/Projects/Tool_OCR

# Run automated setup script
chmod +x setup_conda.sh
./setup_conda.sh

# If Conda was just installed, reload your shell
source ~/.zshrc  # or source ~/.bash_profile

# Run the script again to create environment
./setup_conda.sh
```

### 2. Install Dependencies

```bash
# Activate Conda environment
conda activate tool_ocr

# Install Python dependencies
pip install -r requirements.txt

# Install system dependencies (Pandoc for PDF generation)
brew install pandoc

# Install Chinese fonts for PDF generation (optional)
brew install --cask font-noto-sans-cjk
# Note: macOS built-in fonts work fine, this is optional
```

### 3. Download PaddleOCR Models

```bash
# Create models directory
mkdir -p models/paddleocr

# Models will be automatically downloaded on first run
# (~900MB total, includes PaddleOCR-VL 0.9B model)
```

### 4. Configure Environment

```bash
# Copy environment template
cp .env.example .env

# Edit .env with your settings
# Database credentials are pre-configured
nano .env
```

### 5. Initialize Database

```bash
# Database schema will be created automatically on first run
# Using: mysql.theaken.com:33306/db_A060
```

## Usage

### Start Backend Server

```bash
# Activate environment
conda activate tool_ocr

# Start FastAPI server
cd backend
python -m app.main

# Server runs at: http://localhost:12010
# API docs: http://localhost:12010/docs
```

### Start Frontend (Coming Soon)

```bash
# Install frontend dependencies
cd frontend
npm install

# Start development server
npm run dev

# Frontend runs at: http://localhost:12011
```

## Project Structure

```
Tool_OCR/
├── backend/
│   ├── app/
│   │   ├── api/v1/          # API endpoints
│   │   ├── core/            # Configuration, database
│   │   ├── models/          # Database models
│   │   ├── services/        # Business logic
│   │   ├── utils/           # Utilities
│   │   └── main.py          # Application entry point
│   └── tests/               # Test suite
├── frontend/
│   └── src/                 # React application
├── uploads/
│   ├── temp/                # Temporary uploads
│   ├── processed/           # Processed files
│   └── images/              # Extracted images
├── storage/
│   ├── markdown/            # Markdown outputs
│   ├── json/                # JSON results
│   └── exports/             # Export files
├── models/
│   └── paddleocr/           # PaddleOCR models
├── config/                  # Configuration files
├── templates/               # PDF templates
├── logs/                    # Application logs
├── requirements.txt         # Python dependencies
├── setup_conda.sh           # Environment setup script
├── .env.example             # Environment template
└── README.md
```

## API Endpoints (Planned)

- `POST /api/v1/ocr/upload` - Upload files for OCR processing
- `GET /api/v1/ocr/tasks` - List all OCR tasks
- `GET /api/v1/ocr/tasks/{task_id}` - Get task details
- `POST /api/v1/ocr/batch` - Create batch processing task
- `GET /api/v1/export/{task_id}` - Export results (TXT/JSON/Excel/MD/PDF)
- `POST /api/v1/translate/document` - Translate document (reserved, returns 501)

## Development

### Run Tests

```bash
cd backend
pytest tests/ -v --cov=app
```

### Code Quality

```bash
# Format code
black app/

# Lint code
pylint app/
```

## OpenSpec Workflow

This project follows OpenSpec for specification-driven development:

```bash
# View current changes
openspec list

# Validate specifications
openspec validate add-ocr-batch-processing

# View implementation tasks
cat openspec/changes/add-ocr-batch-processing/tasks.md
```

## Roadmap

- [x] **Phase 0**: Environment setup and configuration
- [ ] **Phase 1**: Core OCR with structure extraction
- [ ] **Phase 2**: Frontend development
- [ ] **Phase 3**: Testing & optimization
- [ ] **Phase 4**: Deployment
- [ ] **Phase 5**: Translation feature (future)

## License

[To be determined]

## Contributors

- Development environment: macOS Apple Silicon
- Database: MySQL external server
- OCR Engine: PaddleOCR-VL 0.9B with PP-StructureV3

## Support

For issues and questions, refer to:
- OpenSpec documentation: `openspec/AGENTS.md`
- Task breakdown: `openspec/changes/add-ocr-batch-processing/tasks.md`
- Specifications: `openspec/changes/add-ocr-batch-processing/specs/`