9.1 KiB
Tool_OCR Setup Guide
Complete setup instructions for macOS environment.
Prerequisites Check
Before starting, verify you have:
- ✅ macOS (Apple Silicon or Intel)
- ✅ Terminal access (zsh or bash)
- ✅ Internet connection for downloads
Step-by-Step Setup
Step 1: Install Conda Environment
Run the automated setup script:
chmod +x setup_conda.sh
./setup_conda.sh
Expected output:
- If Conda not installed: Downloads and installs Miniconda for Apple Silicon
- If Conda already installed: Creates
tool_ocrenvironment with Python 3.10
If Conda was just installed:
# Reload your shell to activate Conda
source ~/.zshrc # if using zsh (default on macOS)
source ~/.bashrc # if using bash
# Run setup script again to create environment
./setup_conda.sh
Step 2: Activate Environment
conda activate tool_ocr
You should see (tool_ocr) prefix in your terminal prompt.
Step 3: Install Python Dependencies
pip install -r requirements.txt
This will install:
- FastAPI and Uvicorn (web framework)
- PaddleOCR and PaddlePaddle (OCR engine)
- Image processing libraries (Pillow, OpenCV, pdf2image)
- PDF generation tools (WeasyPrint, Markdown)
- Database tools (SQLAlchemy, PyMySQL, Alembic)
- Authentication libraries (python-jose, passlib)
- Testing tools (pytest, pytest-asyncio)
Installation time: ~5-10 minutes depending on your internet speed
Step 4: Install System Dependencies
# Install libmagic (required for python-magic file type detection)
brew install libmagic
# Install WeasyPrint dependencies (required for PDF generation)
brew install pango gdk-pixbuf libffi
# Install Pandoc (optional - for enhanced PDF generation)
brew install pandoc
# Install Chinese fonts for PDF output (optional - macOS has built-in Chinese fonts)
brew install --cask font-noto-sans-cjk
# Note: If above fails, skip it - macOS built-in fonts (PingFang SC, Heiti TC) work fine
If Homebrew not installed:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Step 5: Configure Environment Variables
# Copy template
cp .env.example .env
# Edit with your preferred editor
nano .env
# or
code .env
Important settings to verify in .env:
# Database (pre-configured, should work as-is)
MYSQL_HOST=mysql.theaken.com
MYSQL_PORT=33306
MYSQL_USER=A060
MYSQL_PASSWORD=WLeSCi0yhtc7
MYSQL_DATABASE=db_A060
# Application ports
BACKEND_PORT=12010
FRONTEND_PORT=12011
# Security (CHANGE THIS!)
SECRET_KEY=your-secret-key-here-please-change-this-to-random-string
Generate a secure SECRET_KEY:
python -c "import secrets; print(secrets.token_urlsafe(32))"
Copy the output and paste it as your SECRET_KEY value.
Step 6: Set Environment Variable for WeasyPrint
Add to your shell config (~/.zshrc or ~/.bash_profile):
export DYLD_LIBRARY_PATH="/opt/homebrew/lib:$DYLD_LIBRARY_PATH"
Then reload:
source ~/.zshrc # or source ~/.bash_profile
Step 7: Run Service Layer Tests
Verify all services are working:
cd backend
python test_services.py
Expected output:
✓ PASS - database
✓ PASS - preprocessor
✓ PASS - pdf_generator
✓ PASS - file_manager
Total: 4-5/5 tests passed
Note: OCR engine test may fail on first run as PaddleOCR downloads models (~900MB). This is normal.
Step 8: Create Directory Structure
The directories should already exist, but verify:
ls -la
You should see:
backend/- FastAPI applicationfrontend/- React application (will be populated later)uploads/- File upload storagestorage/- Processed resultsmodels/- PaddleOCR models (empty until first run)logs/- Application logs
Step 8: Start Backend Server
cd backend
python -m app.main
Expected output:
INFO: Started server process
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:12010
Test the server: Open browser and visit:
- http://localhost:12010 - API root
- http://localhost:12010/docs - Interactive API documentation
- http://localhost:12010/health - Health check endpoint
Step 9: Download PaddleOCR Models
On first OCR request, PaddleOCR will automatically download models (~900MB).
To pre-download models manually:
python -c "
from paddleocr import PaddleOCR
ocr = PaddleOCR(use_angle_cls=True, lang='ch', use_gpu=False)
print('Models downloaded successfully')
"
This will download:
- Detection model: ch_PP-OCRv4_det
- Recognition model: ch_PP-OCRv4_rec
- Angle classifier: ch_ppocr_mobile_v2.0_cls
Models are stored in: ./models/paddleocr/
Troubleshooting
Issue: "conda: command not found"
Solution:
# Reload shell configuration
source ~/.zshrc # or source ~/.bashrc
# If still not working, manually add Conda to PATH
export PATH="$HOME/miniconda3/bin:$PATH"
Issue: PaddlePaddle installation fails
Solution:
# For Apple Silicon Macs, ensure you're using ARM version
pip uninstall paddlepaddle
pip install paddlepaddle --no-cache-dir
Issue: WeasyPrint fails to install
Solution:
# Install required system libraries
brew install cairo pango gdk-pixbuf libffi
pip install --upgrade weasyprint
Issue: Database connection fails
Solution:
# Test database connection
python -c "
import pymysql
conn = pymysql.connect(
host='mysql.theaken.com',
port=33306,
user='A060',
password='WLeSCi0yhtc7',
database='db_A060'
)
print('Database connection OK')
conn.close()
"
If this fails, verify:
- Internet connection is active
- Firewall is not blocking port 33306
- Database credentials in
.envare correct
Issue: Port 12010 already in use
Solution:
# Find what's using the port
lsof -i :12010
# Kill the process or change port in .env
# Edit BACKEND_PORT=12011 (or any available port)
Next Steps
After successful setup:
- ✅ Environment is ready
- ✅ Backend server can start
- ✅ Database connection configured
Ready to develop:
- Implement database models (
backend/app/models/) - Create API endpoints (
backend/app/api/v1/) - Build OCR service (
backend/app/services/ocr_service.py) - Develop frontend UI (
frontend/src/)
Start with Phase 1 tasks: Refer to openspec/changes/add-ocr-batch-processing/tasks.md for detailed implementation tasks.
Development Workflow
# Activate environment
conda activate tool_ocr
# Start backend in development mode (auto-reload)
cd backend
python -m app.main
bash -c "source ~/.zshrc && conda activate tool_ocr && export DYLD_LIBRARY_PATH=/opt/homebrew/lib:$DYLD_LIBRARY_PATH && python -m app.main"
# In another terminal, start frontend
cd frontend
npm run dev
# Run tests
cd backend
pytest tests/ -v
# Check code style
black app/
pylint app/
Background Services
Automatic Cleanup Scheduler
The application automatically runs a cleanup scheduler that:
- Runs every: 1 hour (configurable via
BackgroundTaskManager.cleanup_interval) - Deletes files older than: 24 hours (configurable via
BackgroundTaskManager.file_retention_hours) - Cleans up:
- Physical files and directories
- Database records (results, files, batches)
- Expired batches in COMPLETED, FAILED, or PARTIAL status
The cleanup scheduler starts automatically when the backend application starts and stops gracefully on shutdown.
Monitor cleanup activity:
# Watch cleanup logs in real-time
tail -f /tmp/tool_ocr_startup.log | grep cleanup
# Or check application logs
tail -f backend/logs/app.log | grep cleanup
Retry Logic
OCR processing includes automatic retry logic:
- Maximum retries: 3 attempts (configurable)
- Retry delay: 5 seconds between attempts (configurable)
- Tracks:
retry_countfield in database - Error handling: Detailed error messages with retry attempt information
Configuration (in backend/app/services/background_tasks.py):
task_manager = BackgroundTaskManager(
max_retries=3, # Number of retry attempts
retry_delay=5, # Delay between retries (seconds)
cleanup_interval=3600, # Cleanup runs every hour
file_retention_hours=24 # Keep files for 24 hours
)
Background Task Status
Check if background services are running:
# Check health endpoint
curl http://localhost:12010/health
# Check application startup logs for cleanup scheduler
grep "cleanup scheduler" /tmp/tool_ocr_startup.log
# Expected output: "Started cleanup scheduler for expired files"
# Expected output: "Starting cleanup scheduler (interval: 3600s, retention: 24h)"
Deactivate Environment
When done working:
conda deactivate
Environment Management
# List Conda environments
conda env list
# Remove environment (if needed)
conda env remove -n tool_ocr
# Export environment
conda env export > environment.yml
# Create from exported environment
conda env create -f environment.yml