feat: migrate to WSL Ubuntu native development environment
從 Docker/macOS+Conda 部署遷移到 WSL2 Ubuntu 原生開發環境 主要變更: - 移除所有 Docker 相關配置檔案 (Dockerfile, docker-compose.yml, .dockerignore 等) - 移除 macOS/Conda 設置腳本 (SETUP.md, setup_conda.sh) - 新增 WSL Ubuntu 自動化環境設置腳本 (setup_dev_env.sh) - 新增後端/前端快速啟動腳本 (start_backend.sh, start_frontend.sh) - 統一開發端口配置 (backend: 8000, frontend: 5173) - 改進資料庫連接穩定性(連接池、超時設置、重試機制) - 更新專案文檔以反映當前 WSL 開發環境 Technical improvements: - Database connection pooling with health checks and auto-reconnection - Retry logic for long-running OCR tasks to prevent DB timeouts - Extended JWT token expiration to 24 hours - Support for Office documents (pptx, docx) via LibreOffice headless - Comprehensive system dependency installation in single script Environment: - OS: WSL2 Ubuntu 24.04 - Python: 3.12 (venv) - Node.js: 24.x LTS (nvm) - Backend Port: 8000 - Frontend Port: 5173 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
284
README.md
284
README.md
@@ -11,6 +11,7 @@ A web-based solution to extract text, images, and document structure from multip
|
||||
- 🖼️ **Image Extraction**: Preserve document images alongside text content
|
||||
- 📑 **Batch Processing**: Process multiple files concurrently with progress tracking
|
||||
- 📤 **Multiple Export Formats**: TXT, JSON, Excel, Markdown with images, searchable PDF
|
||||
- 📋 **Office Documents**: DOC, DOCX, PPT, PPTX support via LibreOffice conversion
|
||||
- 🔧 **Flexible Configuration**: Rule-based output formatting
|
||||
- 🌐 **Translation Ready**: Reserved architecture for future translation features
|
||||
|
||||
@@ -22,173 +23,176 @@ A web-based solution to extract text, images, and document structure from multip
|
||||
- **Database**: MySQL via SQLAlchemy
|
||||
- **PDF Generation**: Pandoc + WeasyPrint
|
||||
- **Image Processing**: OpenCV, Pillow, pdf2image
|
||||
- **Office Conversion**: LibreOffice (headless mode)
|
||||
|
||||
### Frontend
|
||||
- **Framework**: React 18 with Vite
|
||||
- **Styling**: TailwindCSS + shadcn/ui
|
||||
- **HTTP Client**: Axios with React Query
|
||||
- **Framework**: React 19 with TypeScript
|
||||
- **Build Tool**: Vite 7
|
||||
- **Styling**: Tailwind CSS v4 + shadcn/ui
|
||||
- **State Management**: React Query + Zustand
|
||||
- **HTTP Client**: Axios
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- **macOS**: Apple Silicon (M1/M2/M3) or Intel
|
||||
- **Python**: 3.10+
|
||||
- **Conda**: Miniconda or Anaconda (will be installed automatically)
|
||||
- **Homebrew**: For system dependencies
|
||||
- **OS**: WSL2 Ubuntu 24.04
|
||||
- **Python**: 3.12+
|
||||
- **Node.js**: 24.x LTS
|
||||
- **MySQL**: External database server (provided)
|
||||
|
||||
## Installation
|
||||
## Quick Start
|
||||
|
||||
### 1. Automated Setup (Recommended)
|
||||
|
||||
```bash
|
||||
# Clone the repository
|
||||
cd /Users/egg/Projects/Tool_OCR
|
||||
|
||||
# Run automated setup script
|
||||
chmod +x setup_conda.sh
|
||||
./setup_conda.sh
|
||||
|
||||
# If Conda was just installed, reload your shell
|
||||
source ~/.zshrc # or source ~/.bash_profile
|
||||
|
||||
# Run the script again to create environment
|
||||
./setup_conda.sh
|
||||
./setup_dev_env.sh
|
||||
```
|
||||
|
||||
### 2. Install Dependencies
|
||||
This script automatically installs:
|
||||
- Python development tools (pip, venv, build-essential)
|
||||
- System dependencies (pandoc, LibreOffice, fonts, etc.)
|
||||
- Node.js (via nvm)
|
||||
- Python packages
|
||||
- Frontend dependencies
|
||||
|
||||
### 2. Initialize Database
|
||||
|
||||
```bash
|
||||
# Activate Conda environment
|
||||
conda activate tool_ocr
|
||||
|
||||
# Install Python dependencies
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Install system dependencies (Pandoc for PDF generation)
|
||||
brew install pandoc
|
||||
|
||||
# Install Chinese fonts for PDF generation (optional)
|
||||
brew install --cask font-noto-sans-cjk
|
||||
# Note: macOS built-in fonts work fine, this is optional
|
||||
```
|
||||
|
||||
### 3. Download PaddleOCR Models
|
||||
|
||||
```bash
|
||||
# Create models directory
|
||||
mkdir -p models/paddleocr
|
||||
|
||||
# Models will be automatically downloaded on first run
|
||||
# (~900MB total, includes PaddleOCR-VL 0.9B model)
|
||||
```
|
||||
|
||||
### 4. Configure Environment
|
||||
|
||||
```bash
|
||||
# Copy environment template
|
||||
cp .env.example .env
|
||||
|
||||
# Edit .env with your settings
|
||||
# Database credentials are pre-configured
|
||||
nano .env
|
||||
```
|
||||
|
||||
### 5. Initialize Database
|
||||
|
||||
```bash
|
||||
# Database schema will be created automatically on first run
|
||||
# Using: mysql.theaken.com:33306/db_A060
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Start Backend Server
|
||||
|
||||
```bash
|
||||
# Activate environment
|
||||
conda activate tool_ocr
|
||||
|
||||
# Start FastAPI server
|
||||
source venv/bin/activate
|
||||
cd backend
|
||||
python -m app.main
|
||||
|
||||
# Server runs at: http://localhost:12010
|
||||
# API docs: http://localhost:12010/docs
|
||||
alembic upgrade head
|
||||
python create_test_user.py
|
||||
cd ..
|
||||
```
|
||||
|
||||
### Start Frontend (Coming Soon)
|
||||
Default test user:
|
||||
- Username: `admin`
|
||||
- Password: `admin123`
|
||||
|
||||
### 3. Start Development Servers
|
||||
|
||||
**Backend (Terminal 1):**
|
||||
```bash
|
||||
# Install frontend dependencies
|
||||
cd frontend
|
||||
npm install
|
||||
|
||||
# Start development server
|
||||
npm run dev
|
||||
|
||||
# Frontend runs at: http://localhost:12011
|
||||
./start_backend.sh
|
||||
```
|
||||
|
||||
**Frontend (Terminal 2):**
|
||||
```bash
|
||||
./start_frontend.sh
|
||||
```
|
||||
|
||||
### 4. Access Application
|
||||
|
||||
- **Frontend**: http://localhost:5173
|
||||
- **API Docs**: http://localhost:8000/docs
|
||||
- **Health Check**: http://localhost:8000/health
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
Tool_OCR/
|
||||
├── backend/
|
||||
├── backend/ # FastAPI backend
|
||||
│ ├── app/
|
||||
│ │ ├── api/v1/ # API endpoints
|
||||
│ │ ├── core/ # Configuration, database
|
||||
│ │ ├── models/ # Database models
|
||||
│ │ ├── services/ # Business logic
|
||||
│ │ ├── utils/ # Utilities
|
||||
│ │ └── main.py # Application entry point
|
||||
│ └── tests/ # Test suite
|
||||
├── frontend/
|
||||
│ └── src/ # React application
|
||||
├── uploads/
|
||||
│ ├── temp/ # Temporary uploads
|
||||
│ ├── processed/ # Processed files
|
||||
│ └── images/ # Extracted images
|
||||
├── storage/
|
||||
│ ├── markdown/ # Markdown outputs
|
||||
│ ├── json/ # JSON results
|
||||
│ └── exports/ # Export files
|
||||
├── models/
|
||||
│ └── paddleocr/ # PaddleOCR models
|
||||
├── config/ # Configuration files
|
||||
├── templates/ # PDF templates
|
||||
├── logs/ # Application logs
|
||||
├── requirements.txt # Python dependencies
|
||||
├── setup_conda.sh # Environment setup script
|
||||
├── .env.example # Environment template
|
||||
└── README.md
|
||||
│ │ ├── api/v1/ # API endpoints
|
||||
│ │ ├── core/ # Configuration, database
|
||||
│ │ ├── models/ # Database models
|
||||
│ │ ├── services/ # Business logic
|
||||
│ │ └── main.py # Application entry point
|
||||
│ ├── alembic/ # Database migrations
|
||||
│ └── tests/ # Test suite
|
||||
├── frontend/ # React frontend
|
||||
│ ├── src/
|
||||
│ │ ├── components/ # UI components
|
||||
│ │ ├── pages/ # Page components
|
||||
│ │ ├── services/ # API services
|
||||
│ │ └── stores/ # State management
|
||||
│ └── public/ # Static assets
|
||||
├── .env.local # Local development config
|
||||
├── setup_dev_env.sh # Environment setup script
|
||||
├── start_backend.sh # Backend startup script
|
||||
└── start_frontend.sh # Frontend startup script
|
||||
```
|
||||
|
||||
## API Endpoints (Planned)
|
||||
## Configuration
|
||||
|
||||
- `POST /api/v1/ocr/upload` - Upload files for OCR processing
|
||||
- `GET /api/v1/ocr/tasks` - List all OCR tasks
|
||||
- `GET /api/v1/ocr/tasks/{task_id}` - Get task details
|
||||
- `POST /api/v1/ocr/batch` - Create batch processing task
|
||||
- `GET /api/v1/export/{task_id}` - Export results (TXT/JSON/Excel/MD/PDF)
|
||||
- `POST /api/v1/translate/document` - Translate document (reserved, returns 501)
|
||||
Main config file: `.env.local`
|
||||
|
||||
```bash
|
||||
# Database
|
||||
MYSQL_HOST=mysql.theaken.com
|
||||
MYSQL_PORT=33306
|
||||
|
||||
# Application ports
|
||||
BACKEND_PORT=8000
|
||||
FRONTEND_PORT=5173
|
||||
|
||||
# Token expiration (minutes)
|
||||
ACCESS_TOKEN_EXPIRE_MINUTES=1440 # 24 hours
|
||||
|
||||
# Supported file formats
|
||||
ALLOWED_EXTENSIONS=png,jpg,jpeg,pdf,bmp,tiff,doc,docx,ppt,pptx
|
||||
|
||||
# OCR settings
|
||||
OCR_LANGUAGES=ch,en,japan,korean
|
||||
MAX_OCR_WORKERS=4
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### Authentication
|
||||
- `POST /api/v1/auth/login` - User login
|
||||
|
||||
### File Management
|
||||
- `POST /api/v1/upload` - Upload files
|
||||
- `POST /api/v1/ocr/process` - Start OCR processing
|
||||
- `GET /api/v1/batch/{id}/status` - Get batch status
|
||||
|
||||
### Results & Export
|
||||
- `GET /api/v1/ocr/result/{id}` - Get OCR result
|
||||
- `GET /api/v1/export/pdf/{id}` - Export as PDF
|
||||
|
||||
Full API documentation: http://localhost:8000/docs
|
||||
|
||||
## Supported File Formats
|
||||
|
||||
- **Images**: PNG, JPG, JPEG, BMP, TIFF
|
||||
- **Documents**: PDF
|
||||
- **Office**: DOC, DOCX, PPT, PPTX
|
||||
|
||||
Office files are automatically converted to PDF before OCR processing.
|
||||
|
||||
## Development
|
||||
|
||||
### Run Tests
|
||||
### Backend
|
||||
|
||||
```bash
|
||||
source venv/bin/activate
|
||||
cd backend
|
||||
pytest tests/ -v --cov=app
|
||||
|
||||
# Run tests
|
||||
pytest
|
||||
|
||||
# Database migration
|
||||
alembic revision --autogenerate -m "description"
|
||||
alembic upgrade head
|
||||
|
||||
# Code formatting
|
||||
black app/
|
||||
```
|
||||
|
||||
### Code Quality
|
||||
### Frontend
|
||||
|
||||
```bash
|
||||
# Format code
|
||||
black app/
|
||||
cd frontend
|
||||
|
||||
# Development server
|
||||
npm run dev
|
||||
|
||||
# Build for production
|
||||
npm run build
|
||||
|
||||
# Lint code
|
||||
pylint app/
|
||||
npm run lint
|
||||
```
|
||||
|
||||
## OpenSpec Workflow
|
||||
@@ -208,26 +212,26 @@ cat openspec/changes/add-ocr-batch-processing/tasks.md
|
||||
|
||||
## Roadmap
|
||||
|
||||
- [x] **Phase 0**: Environment setup and configuration
|
||||
- [ ] **Phase 1**: Core OCR with structure extraction
|
||||
- [ ] **Phase 2**: Frontend development
|
||||
- [x] **Phase 0**: Environment setup
|
||||
- [x] **Phase 1**: Core OCR backend (~98% complete)
|
||||
- [x] **Phase 2**: Frontend development (~92% complete)
|
||||
- [ ] **Phase 3**: Testing & optimization
|
||||
- [ ] **Phase 4**: Deployment
|
||||
- [ ] **Phase 4**: Deployment automation
|
||||
- [ ] **Phase 5**: Translation feature (future)
|
||||
|
||||
## Documentation
|
||||
|
||||
- Development specs: [openspec/project.md](openspec/project.md)
|
||||
- Implementation status: [openspec/changes/add-ocr-batch-processing/STATUS.md](openspec/changes/add-ocr-batch-processing/STATUS.md)
|
||||
- Agent instructions: [openspec/AGENTS.md](openspec/AGENTS.md)
|
||||
|
||||
## License
|
||||
|
||||
[To be determined]
|
||||
Internal project use
|
||||
|
||||
## Contributors
|
||||
## Notes
|
||||
|
||||
- Development environment: macOS Apple Silicon
|
||||
- Database: MySQL external server
|
||||
- OCR Engine: PaddleOCR-VL 0.9B with PP-StructureV3
|
||||
|
||||
## Support
|
||||
|
||||
For issues and questions, refer to:
|
||||
- OpenSpec documentation: `openspec/AGENTS.md`
|
||||
- Task breakdown: `openspec/changes/add-ocr-batch-processing/tasks.md`
|
||||
- Specifications: `openspec/changes/add-ocr-batch-processing/specs/`
|
||||
- First OCR run will download PaddleOCR models (~900MB)
|
||||
- Token expiration is set to 24 hours by default
|
||||
- Office conversion requires LibreOffice (installed via setup script)
|
||||
- Development environment: WSL2 Ubuntu 24.04 with Python venv
|
||||
|
||||
Reference in New Issue
Block a user