first

2025-11-12 22:53:17 +08:00
commit da700721fa
130 changed files with 23393 additions and 0 deletions
--- a/openspec/project.md
+++ b/openspec/project.md
@@ -0,0 +1,313 @@
+# Project Context
+
+## Purpose
+Tool_OCR is a web-based application for batch image-to-text conversion with multi-language support and rule-based output formatting. The tool uses a modern frontend-backend separation architecture, designed to process multiple images/PDFs simultaneously, extract text using OCR, and export results in various formats according to user-defined rules.
+
+**Key Goals:**
+- Batch processing of images and PDF files for text extraction via web interface
+- Multi-language OCR support (Chinese, English, and other languages)
+- Rule-based output formatting and organization
+- User-friendly web interface accessible via browser
+- Export flexibility (TXT, JSON, Excel, etc.)
+- RESTful API for OCR processing
+
+## Tech Stack
+
+### Development Environment
+- **OS Platform**: Windows 10/11
+- **Python Version**: 3.10 (via Conda)
+- **Environment Manager**: Conda
+- **Virtual Environment Path**: `C:\Users\lin46\.conda\envs\tool_ocr`
+- **IDE Recommended**: VS Code with Python + React extensions
+
+### Backend Technologies
+- **Language**: Python 3.10+
+- **Web Framework**: FastAPI (modern, async, auto API docs)
+- **OCR Engine**: PaddleOCR (deep learning-based, excellent multi-language support)
+- **PDF Processing**: PyPDF2 / pdf2image
+- **Image Processing**: Pillow (PIL)
+- **Data Export**: pandas (Excel), json (JSON)
+- **Database**: MySQL (configuration storage, task history)
+- **Cache**: Redis (optional, for task queue)
+- **Authentication**: JWT
+
+### Frontend Technologies
+- **Framework**: React 18+
+- **Build Tool**: Vite
+- **UI Library**: Tailwind CSS + shadcn/ui
+- **State Management**: React Query (for API calls) + Zustand (for global state)
+- **HTTP Client**: Axios
+- **File Upload**: react-dropzone
+
+### Development Tools
+- **Package Manager**: Conda + pip (backend), npm/pnpm (frontend)
+- **Deployment**: 1Panel (web-based server management)
+- **Process Manager**: systemd / PM2 / Supervisor
+- **Web Server**: Nginx (reverse proxy)
+- **Testing**: pytest (backend), Vitest (frontend)
+- **Code Style**: Black + pylint (Python), ESLint + Prettier (JavaScript/TypeScript)
+- **Version Control**: Git
+
+### Key Libraries (Backend)
+- fastapi: Web framework
+- uvicorn: ASGI server
+- paddleocr: OCR processing
+- pdf2image: PDF to image conversion
+- pillow: Image manipulation
+- pandas: Data export to Excel
+- pyyaml: Configuration management
+- python-jose: JWT authentication
+- sqlalchemy: Database ORM
+- pydantic: Data validation
+
+### Key Libraries (Frontend)
+- react: UI framework
+- vite: Build tool
+- tailwindcss: CSS framework
+- shadcn/ui: UI components
+- axios: HTTP client
+- react-query: Server state management
+- zustand: Client state management
+- react-dropzone: File upload
+
+## Project Conventions
+
+### Environment Setup (Backend)
+```bash
+# Create new conda environment
+conda create -n tool_ocr python=3.10 -y
+
+# Activate environment
+conda activate tool_ocr
+
+# Install dependencies
+pip install -r requirements.txt
+```
+
+### Environment Setup (Frontend)
+```bash
+# Navigate to frontend directory
+cd frontend
+
+# Install dependencies
+npm install
+
+# Run dev server
+npm run dev
+```
+
+### Code Style
+
+#### Backend (Python)
+- **Formatter**: Black with line length 100
+- **Naming Conventions**:
+  - Classes: PascalCase (e.g., `OcrProcessor`, `ImageService`)
+  - Functions/Methods: snake_case (e.g., `process_image`, `export_results`)
+  - Constants: UPPER_SNAKE_CASE (e.g., `MAX_BATCH_SIZE`, `DEFAULT_LANG`)
+  - Private members: prefix with underscore (e.g., `_internal_method`)
+- **Docstrings**: Google style for all public functions and classes
+- **Type Hints**: Use type hints for function signatures (FastAPI requirement)
+- **Imports**: Organized by standard library, third-party, local (separated by blank lines)
+- **Encoding**: UTF-8 for all Python files
+
+#### Frontend (JavaScript/TypeScript)
+- **Formatter**: Prettier
+- **Naming Conventions**:
+  - Components: PascalCase (e.g., `ImageUpload`, `ResultsTable`)
+  - Functions/Variables: camelCase (e.g., `processImage`, `ocrResults`)
+  - Constants: UPPER_SNAKE_CASE (e.g., `MAX_FILE_SIZE`, `API_BASE_URL`)
+  - CSS Classes: kebab-case (Tailwind convention)
+- **File Structure**: One component per file
+- **Imports**: Group by external, internal, types
+
+### Architecture Patterns
+
+#### Backend Architecture
+- **Layered Architecture**:
+  - Router Layer (FastAPI routes)
+  - Service Layer (business logic)
+  - Data Access Layer (database/file operations)
+  - Model Layer (Pydantic models)
+- **Async/Await**: Use async operations for I/O bound tasks
+- **Dependency Injection**: FastAPI's dependency injection for services
+- **Error Handling**: Custom exception handlers with proper HTTP status codes
+- **Logging**: Structured logging with log levels
+- **Background Tasks**: FastAPI BackgroundTasks for long-running OCR jobs
+
+#### Frontend Architecture
+- **Component-Based**: Reusable React components
+- **Atomic Design**: atoms → molecules → organisms → templates → pages
+- **API Layer**: Centralized API client with React Query
+- **State Management**: Server state (React Query) + Client state (Zustand)
+- **Routing**: React Router for SPA navigation
+- **Error Boundaries**: Graceful error handling in UI
+
+#### API Design
+- **RESTful**: Follow REST conventions
+- **Versioning**: API versioned as `/api/v1/...`
+- **Documentation**: Auto-generated via FastAPI (Swagger/OpenAPI)
+- **Response Format**: Consistent JSON structure
+  ```json
+  {
+    "success": true,
+    "data": {},
+    "message": "Success",
+    "timestamp": "2025-01-01T00:00:00Z"
+  }
+  ```
+
+### Testing Strategy
+
+#### Backend Testing
+- **Unit Tests**: Test services, utilities, data models
+- **Integration Tests**: Test API endpoints end-to-end
+- **Test Framework**: pytest with pytest-asyncio
+- **Coverage Target**: Minimum 70% code coverage
+- **Test Command**: `pytest tests/ -v --cov=app`
+
+#### Frontend Testing
+- **Component Tests**: Test React components with Vitest + React Testing Library
+- **Integration Tests**: Test user workflows
+- **E2E Tests**: Optional with Playwright
+- **Test Command**: `npm run test`
+
+### Git Workflow
+- **Branching**: Feature branches from main (e.g., `feature/add-pdf-support`)
+- **Commits**: Conventional Commits format (e.g., `feat:`, `fix:`, `docs:`)
+- **PRs**: Require passing tests before merge
+- **Versioning**: Semantic versioning (MAJOR.MINOR.PATCH)
+
+## Domain Context
+
+### OCR Concepts
+- **Recognition Accuracy**: Depends on image quality, language, and font type
+- **Preprocessing**: Image enhancement (contrast, denoising) can improve OCR accuracy
+- **Multi-Language**: PaddleOCR supports Chinese, English, Japanese, Korean, and many others
+- **Bounding Boxes**: OCR engines detect text regions before recognition
+- **Confidence Scores**: Each recognized text has a confidence score (0-1)
+
+### Use Cases
+- Digitizing scanned documents and images via web upload
+- Extracting text from screenshots for archival
+- Processing receipts and invoices for data entry
+- Converting image-based PDFs to searchable text
+- Batch processing multiple files via drag-and-drop interface
+
+### Output Rules
+- Users can define custom rules for organizing extracted text
+- Examples: group by file name pattern, filter by confidence threshold, format as structured data
+- Export formats: plain text files, JSON with metadata, Excel spreadsheets
+
+## Important Constraints
+
+### Technical Constraints
+- **Platform**: Windows 10/11 (development), Docker-based deployment
+- **Web Application**: Browser-based interface (Chrome, Firefox, Edge)
+- **Local Processing**: All OCR processing happens on backend server (no cloud dependencies)
+- **Resource Intensive**: OCR is CPU/GPU intensive; consider task queue for batch processing
+- **File Size Limits**: Set max upload size (e.g., 20MB per file, 100MB per batch)
+- **Language Models**: PaddleOCR models must be downloaded (~100MB+ per language)
+- **Conda Environment**: Backend development must be done within Conda virtual environment
+- **Port Range**: Web services must use ports 12010-12019
+
+### User Experience Constraints
+- **Target Users**: Non-technical users who need simple batch OCR via web
+- **Browser Compatibility**: Modern browsers (Chrome 90+, Firefox 88+, Edge 90+)
+- **Performance**: UI must show progress feedback during OCR processing
+- **Error Messages**: Clear, actionable error messages in Traditional Chinese
+- **Responsive Design**: UI should work on desktop and tablet (mobile optional)
+
+### Business Constraints
+- **Open Source**: Use only open-source libraries (no paid API dependencies)
+- **Deployment**: 1Panel-based deployment (no Docker required)
+- **Offline Capable**: Must work without internet after initial setup (except model downloads)
+- **Authentication**: JWT-based auth (optional LDAP integration for enterprise)
+
+### Security Constraints
+- **File Upload**: Validate file types, scan for malware (optional)
+- **Authentication**: JWT tokens with expiration
+- **CORS**: Configure CORS for frontend-backend communication
+- **Input Validation**: Strict validation on all API inputs
+
+## External Dependencies
+
+### Database Configuration
+- **MySQL Host**: mysql.theaken.com
+- **MySQL Port**: 33306
+- **MySQL User**: A060
+- **MySQL Password**: WLeSCi0yhtc7
+- **MySQL Database**: db_A060
+- **MySQL Charset**: utf8mb4
+
+### SMTP Configuration (Optional)
+- **SMTP Server**: mail.panjit.com.tw
+- **SMTP Port**: 25
+- **SMTP TLS**: false
+- **SMTP Auth**: false
+- **Sender Email**: tool-ocr-system@panjit.com.tw
+
+### LDAP Configuration (Optional)
+- **LDAP Server**: panjit.com.tw
+- **LDAP Port**: 389
+
+### Conda Environment
+- **Environment Name**: `tool_ocr`
+- **Python Version**: 3.10
+- **Base Path**: `C:\Users\lin46\.conda\envs\tool_ocr`
+- **Activation**: Always activate environment before backend development
+
+### OCR Models
+- **PaddleOCR Models**: Downloaded automatically on first run or manually installed
+- **Model Storage**: Local cache directory or Docker volume
+- **Supported Languages**: Chinese (simplified/traditional), English, Japanese, Korean, etc.
+- **Model Size**: ~100-200MB per language pack
+
+### System Requirements
+- **Python**: 3.10+ (managed by Conda)
+- **Node.js**: 18+ (for frontend development and build)
+- **RAM**: Minimum 4GB (8GB recommended for batch processing)
+- **Disk Space**: ~2GB for application + models + dependencies
+- **OS**: Windows 10/11 (development), Linux (1Panel deployment server)
+- **Web Server**: Nginx (for static files and reverse proxy)
+- **Process Manager**: Supervisor / PM2 / systemd (for backend service)
+
+### Port Configuration
+- **Backend API**: 12010 (FastAPI via uvicorn)
+- **Frontend Dev Server**: 12011 (Vite, development only)
+- **Nginx**: 80/443 (production, managed by 1Panel)
+- **MySQL**: 33306 (external)
+- **Redis**: 6379 (optional, local)
+
+### Deployment Architecture (1Panel)
+- **Development**: Windows with Conda + local Node.js
+- **Production**: Linux server managed by 1Panel
+- **Backend Deployment**:
+  - Conda environment on production server
+  - uvicorn runs FastAPI on port 12010
+  - Managed by Supervisor/PM2/systemd for auto-restart
+- **Frontend Deployment**:
+  - Build static files with `npm run build`
+  - Served by Nginx (configured via 1Panel)
+  - Nginx reverse proxies `/api` to backend (12010)
+- **1Panel Features**:
+  - Website management (Nginx configuration)
+  - Process management (backend service)
+  - SSL certificate management (Let's Encrypt)
+  - File management and deployment
+
+### Configuration Files
+- **Backend**:
+  - `environment.yml`: Conda environment specification
+  - `requirements.txt`: Pip dependencies
+  - `.env`: Environment variables (database, JWT secret, etc.)
+  - `config.yaml`: Application configuration
+  - `start.sh`: Backend startup script
+- **Frontend**:
+  - `package.json`: npm dependencies
+  - `.env.production`: Production environment variables (API URL)
+  - `vite.config.js`: Vite configuration
+  - `build.sh`: Frontend build script
+- **Deployment**:
+  - `nginx.conf`: Nginx reverse proxy configuration
+  - `supervisor.conf` or `pm2.config.js`: Process manager configuration
+  - `deploy.sh`: Deployment automation script