feat: migrate to WSL Ubuntu native development environment

從 Docker/macOS+Conda 部署遷移到 WSL2 Ubuntu 原生開發環境主要變更： - 移除所有 Docker 相關配置檔案 (Dockerfile, docker-compose.yml, .dockerignore 等) - 移除 macOS/Conda 設置腳本 (SETUP.md, setup_conda.sh) - 新增 WSL Ubuntu 自動化環境設置腳本 (setup_dev_env.sh) - 新增後端/前端快速啟動腳本 (start_backend.sh, start_frontend.sh) - 統一開發端口配置 (backend: 8000, frontend: 5173) - 改進資料庫連接穩定性（連接池、超時設置、重試機制） - 更新專案文檔以反映當前 WSL 開發環境 Technical improvements: - Database connection pooling with health checks and auto-reconnection - Retry logic for long-running OCR tasks to prevent DB timeouts - Extended JWT token expiration to 24 hours - Support for Office documents (pptx, docx) via LibreOffice headless - Comprehensive system dependency installation in single script Environment: - OS: WSL2 Ubuntu 24.04 - Python: 3.12 (venv) - Node.js: 24.x LTS (nvm) - Backend Port: 8000 - Frontend Port: 5173 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 21:00:42 +08:00
parent 0f81d5e70b
commit d7e64737b7
25 changed files with 511 additions and 1774 deletions
--- a/README.md
+++ b/README.md
@@ -11,6 +11,7 @@ A web-based solution to extract text, images, and document structure from multip
 - 🖼️ **Image Extraction**: Preserve document images alongside text content
 - 📑 **Batch Processing**: Process multiple files concurrently with progress tracking
 - 📤 **Multiple Export Formats**: TXT, JSON, Excel, Markdown with images, searchable PDF
+- 📋 **Office Documents**: DOC, DOCX, PPT, PPTX support via LibreOffice conversion
 - 🔧 **Flexible Configuration**: Rule-based output formatting
 - 🌐 **Translation Ready**: Reserved architecture for future translation features

@@ -22,173 +23,176 @@ A web-based solution to extract text, images, and document structure from multip
 - **Database**: MySQL via SQLAlchemy
 - **PDF Generation**: Pandoc + WeasyPrint
 - **Image Processing**: OpenCV, Pillow, pdf2image
+- **Office Conversion**: LibreOffice (headless mode)

 ### Frontend
- **Framework**: React 18 with Vite
- **Styling**: TailwindCSS + shadcn/ui
- **HTTP Client**: Axios with React Query
+- **Framework**: React 19 with TypeScript
+- **Build Tool**: Vite 7
+- **Styling**: Tailwind CSS v4 + shadcn/ui
+- **State Management**: React Query + Zustand
+- **HTTP Client**: Axios

 ## Prerequisites

- **macOS**: Apple Silicon (M1/M2/M3) or Intel
- **Python**: 3.10+
- **Conda**: Miniconda or Anaconda (will be installed automatically)
- **Homebrew**: For system dependencies
+- **OS**: WSL2 Ubuntu 24.04
+- **Python**: 3.12+
+- **Node.js**: 24.x LTS
 - **MySQL**: External database server (provided)

-## Installation
+## Quick Start

 ### 1. Automated Setup (Recommended)

 ```bash
-# Clone the repository
-cd /Users/egg/Projects/Tool_OCR
-
 # Run automated setup script
-chmod +x setup_conda.sh
-./setup_conda.sh
-
-# If Conda was just installed, reload your shell
-source ~/.zshrc  # or source ~/.bash_profile
-
-# Run the script again to create environment
-./setup_conda.sh
+./setup_dev_env.sh
 ```

-### 2. Install Dependencies
+This script automatically installs:
+- Python development tools (pip, venv, build-essential)
+- System dependencies (pandoc, LibreOffice, fonts, etc.)
+- Node.js (via nvm)
+- Python packages
+- Frontend dependencies
+
+### 2. Initialize Database

 ```bash
-# Activate Conda environment
-conda activate tool_ocr
-
-# Install Python dependencies
-pip install -r requirements.txt
-
-# Install system dependencies (Pandoc for PDF generation)
-brew install pandoc
-
-# Install Chinese fonts for PDF generation (optional)
-brew install --cask font-noto-sans-cjk
-# Note: macOS built-in fonts work fine, this is optional
-```
-
-### 3. Download PaddleOCR Models
-
-```bash
-# Create models directory
-mkdir -p models/paddleocr
-
-# Models will be automatically downloaded on first run
-# (~900MB total, includes PaddleOCR-VL 0.9B model)
-```
-
-### 4. Configure Environment
-
-```bash
-# Copy environment template
-cp .env.example .env
-
-# Edit .env with your settings
-# Database credentials are pre-configured
-nano .env
-```
-
-### 5. Initialize Database
-
-```bash
-# Database schema will be created automatically on first run
-# Using: mysql.theaken.com:33306/db_A060
-```
-
-## Usage
-
-### Start Backend Server
-
-```bash
-# Activate environment
-conda activate tool_ocr
-
-# Start FastAPI server
+source venv/bin/activate
 cd backend
-python -m app.main
-
-# Server runs at: http://localhost:12010
-# API docs: http://localhost:12010/docs
+alembic upgrade head
+python create_test_user.py
+cd ..
 ```

-### Start Frontend (Coming Soon)
+Default test user:
+- Username: `admin`
+- Password: `admin123`

+### 3. Start Development Servers
+
+**Backend (Terminal 1):**
 ```bash
-# Install frontend dependencies
-cd frontend
-npm install
-
-# Start development server
-npm run dev
-
-# Frontend runs at: http://localhost:12011
+./start_backend.sh
 ```

+**Frontend (Terminal 2):**
+```bash
+./start_frontend.sh
+```
+
+### 4. Access Application
+
+- **Frontend**: http://localhost:5173
+- **API Docs**: http://localhost:8000/docs
+- **Health Check**: http://localhost:8000/health
+
 ## Project Structure

 ```
 Tool_OCR/
-├── backend/
+├── backend/                 # FastAPI backend
 │   ├── app/
-│   │   ├── api/v1/          # API endpoints
-│   │   ├── core/            # Configuration, database
-│   │   ├── models/          # Database models
-│   │   ├── services/        # Business logic
-│   │   ├── utils/           # Utilities
-│   │   └── main.py          # Application entry point
-│   └── tests/               # Test suite
-├── frontend/
-│   └── src/                 # React application
-├── uploads/
-│   ├── temp/                # Temporary uploads
-│   ├── processed/           # Processed files
-│   └── images/              # Extracted images
-├── storage/
-│   ├── markdown/            # Markdown outputs
-│   ├── json/                # JSON results
-│   └── exports/             # Export files
-├── models/
-│   └── paddleocr/           # PaddleOCR models
-├── config/                  # Configuration files
-├── templates/               # PDF templates
-├── logs/                    # Application logs
-├── requirements.txt         # Python dependencies
-├── setup_conda.sh           # Environment setup script
-├── .env.example             # Environment template
-└── README.md
+│   │   ├── api/v1/         # API endpoints
+│   │   ├── core/           # Configuration, database
+│   │   ├── models/         # Database models
+│   │   ├── services/       # Business logic
+│   │   └── main.py         # Application entry point
+│   ├── alembic/            # Database migrations
+│   └── tests/              # Test suite
+├── frontend/               # React frontend
+│   ├── src/
+│   │   ├── components/     # UI components
+│   │   ├── pages/          # Page components
+│   │   ├── services/       # API services
+│   │   └── stores/         # State management
+│   └── public/             # Static assets
+├── .env.local              # Local development config
+├── setup_dev_env.sh        # Environment setup script
+├── start_backend.sh        # Backend startup script
+└── start_frontend.sh       # Frontend startup script
 ```

-## API Endpoints (Planned)
+## Configuration

- `POST /api/v1/ocr/upload` - Upload files for OCR processing
- `GET /api/v1/ocr/tasks` - List all OCR tasks
- `GET /api/v1/ocr/tasks/{task_id}` - Get task details
- `POST /api/v1/ocr/batch` - Create batch processing task
- `GET /api/v1/export/{task_id}` - Export results (TXT/JSON/Excel/MD/PDF)
- `POST /api/v1/translate/document` - Translate document (reserved, returns 501)
+Main config file: `.env.local`
+
+```bash
+# Database
+MYSQL_HOST=mysql.theaken.com
+MYSQL_PORT=33306
+
+# Application ports
+BACKEND_PORT=8000
+FRONTEND_PORT=5173
+
+# Token expiration (minutes)
+ACCESS_TOKEN_EXPIRE_MINUTES=1440  # 24 hours
+
+# Supported file formats
+ALLOWED_EXTENSIONS=png,jpg,jpeg,pdf,bmp,tiff,doc,docx,ppt,pptx
+
+# OCR settings
+OCR_LANGUAGES=ch,en,japan,korean
+MAX_OCR_WORKERS=4
+```
+
+## API Endpoints
+
+### Authentication
+- `POST /api/v1/auth/login` - User login
+
+### File Management
+- `POST /api/v1/upload` - Upload files
+- `POST /api/v1/ocr/process` - Start OCR processing
+- `GET /api/v1/batch/{id}/status` - Get batch status
+
+### Results & Export
+- `GET /api/v1/ocr/result/{id}` - Get OCR result
+- `GET /api/v1/export/pdf/{id}` - Export as PDF
+
+Full API documentation: http://localhost:8000/docs
+
+## Supported File Formats
+
+- **Images**: PNG, JPG, JPEG, BMP, TIFF
+- **Documents**: PDF
+- **Office**: DOC, DOCX, PPT, PPTX
+
+Office files are automatically converted to PDF before OCR processing.

 ## Development

-### Run Tests
+### Backend

 ```bash
+source venv/bin/activate
 cd backend
-pytest tests/ -v --cov=app
+
+# Run tests
+pytest
+
+# Database migration
+alembic revision --autogenerate -m "description"
+alembic upgrade head
+
+# Code formatting
+black app/
 ```

-### Code Quality
+### Frontend

 ```bash
-# Format code
-black app/
+cd frontend
+
+# Development server
+npm run dev
+
+# Build for production
+npm run build

 # Lint code
-pylint app/
+npm run lint
 ```

 ## OpenSpec Workflow
@@ -208,26 +212,26 @@ cat openspec/changes/add-ocr-batch-processing/tasks.md

 ## Roadmap

- [x] **Phase 0**: Environment setup and configuration
- [ ] **Phase 1**: Core OCR with structure extraction
- [ ] **Phase 2**: Frontend development
+- [x] **Phase 0**: Environment setup
+- [x] **Phase 1**: Core OCR backend (~98% complete)
+- [x] **Phase 2**: Frontend development (~92% complete)
 - [ ] **Phase 3**: Testing & optimization
- [ ] **Phase 4**: Deployment
+- [ ] **Phase 4**: Deployment automation
 - [ ] **Phase 5**: Translation feature (future)

+## Documentation
+
+- Development specs: [openspec/project.md](openspec/project.md)
+- Implementation status: [openspec/changes/add-ocr-batch-processing/STATUS.md](openspec/changes/add-ocr-batch-processing/STATUS.md)
+- Agent instructions: [openspec/AGENTS.md](openspec/AGENTS.md)
+
 ## License

-[To be determined]
+Internal project use

-## Contributors
+## Notes

- Development environment: macOS Apple Silicon
- Database: MySQL external server
- OCR Engine: PaddleOCR-VL 0.9B with PP-StructureV3
-
-## Support
-
-For issues and questions, refer to:
- OpenSpec documentation: `openspec/AGENTS.md`
- Task breakdown: `openspec/changes/add-ocr-batch-processing/tasks.md`
- Specifications: `openspec/changes/add-ocr-batch-processing/specs/`
+- First OCR run will download PaddleOCR models (~900MB)
+- Token expiration is set to 24 hours by default
+- Office conversion requires LibreOffice (installed via setup script)
+- Development environment: WSL2 Ubuntu 24.04 with Python venv