This commit implements comprehensive external Azure AD authentication with complete task management, file download, and admin monitoring systems. ## Core Features Implemented (80% Complete) ### 1. Token Auto-Refresh Mechanism ✅ - Backend: POST /api/v2/auth/refresh endpoint - Frontend: Auto-refresh 5 minutes before expiration - Auto-retry on 401 errors with seamless token refresh ### 2. File Download System ✅ - Three format support: JSON / Markdown / PDF - Endpoints: GET /api/v2/tasks/{id}/download/{format} - File access control with ownership validation - Frontend download buttons in TaskHistoryPage ### 3. Complete Task Management ✅ Backend Endpoints: - POST /api/v2/tasks/{id}/start - Start task - POST /api/v2/tasks/{id}/cancel - Cancel task - POST /api/v2/tasks/{id}/retry - Retry failed task - GET /api/v2/tasks - List with filters (status, filename, date range) - GET /api/v2/tasks/stats - User statistics Frontend Features: - Status-based action buttons (Start/Cancel/Retry) - Advanced search and filtering (status, filename, date range) - Pagination and sorting - Task statistics dashboard (5 stat cards) ### 4. Admin Monitoring System ✅ (Backend) Admin APIs: - GET /api/v2/admin/stats - System statistics - GET /api/v2/admin/users - User list with stats - GET /api/v2/admin/users/top - User leaderboard - GET /api/v2/admin/audit-logs - Audit log query system - GET /api/v2/admin/audit-logs/user/{id}/summary Admin Features: - Email-based admin check (ymirliu@panjit.com.tw) - Comprehensive system metrics (users, tasks, sessions, activity) - Audit logging service for security tracking ### 5. User Isolation & Security ✅ - Row-level security on all task queries - File access control with ownership validation - Strict user_id filtering on all operations - Session validation and expiry checking - Admin privilege verification ## New Files Created Backend: - backend/app/models/user_v2.py - User model for external auth - backend/app/models/task.py - Task model with user isolation - backend/app/models/session.py - Session management - backend/app/models/audit_log.py - Audit log model - backend/app/services/external_auth_service.py - External API client - backend/app/services/task_service.py - Task CRUD with isolation - backend/app/services/file_access_service.py - File access control - backend/app/services/admin_service.py - Admin operations - backend/app/services/audit_service.py - Audit logging - backend/app/routers/auth_v2.py - V2 auth endpoints - backend/app/routers/tasks.py - Task management endpoints - backend/app/routers/admin.py - Admin endpoints - backend/alembic/versions/5e75a59fb763_*.py - DB migration Frontend: - frontend/src/services/apiV2.ts - Complete V2 API client - frontend/src/types/apiV2.ts - V2 type definitions - frontend/src/pages/TaskHistoryPage.tsx - Task history UI Modified Files: - backend/app/core/deps.py - Added get_current_admin_user_v2 - backend/app/main.py - Registered admin router - frontend/src/pages/LoginPage.tsx - V2 login integration - frontend/src/components/Layout.tsx - User display and logout - frontend/src/App.tsx - Added /tasks route ## Documentation - openspec/changes/.../PROGRESS_UPDATE.md - Detailed progress report ## Pending Items (20%) 1. Database migration execution for audit_logs table 2. Frontend admin dashboard page 3. Frontend audit log viewer ## Testing Status - Manual testing: ✅ Authentication flow verified - Unit tests: ⏳ Pending - Integration tests: ⏳ Pending ## Security Enhancements - ✅ User isolation (row-level security) - ✅ File access control - ✅ Token expiry validation - ✅ Admin privilege verification - ✅ Audit logging infrastructure - ⏳ Token encryption (noted, low priority) - ⏳ Rate limiting (noted, low priority) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Tool_OCR
OCR Batch Processing System with Structure Extraction
A web-based solution to extract text, images, and document structure from multiple files efficiently using PaddleOCR-VL.
Features
- 🔍 Multi-Language OCR: Support for 109 languages (Chinese, English, Japanese, Korean, etc.)
- 📄 Document Structure Analysis: Intelligent layout analysis with PP-StructureV3
- 🖼️ Image Extraction: Preserve document images alongside text content
- 📑 Batch Processing: Process multiple files concurrently with progress tracking
- 📤 Multiple Export Formats: TXT, JSON, Excel, Markdown with images, searchable PDF
- 📋 Office Documents: DOC, DOCX, PPT, PPTX support via LibreOffice conversion
- 🚀 GPU Acceleration: Automatic CUDA GPU detection with graceful CPU fallback
- 🔧 Flexible Configuration: Rule-based output formatting
- 🌐 Translation Ready: Reserved architecture for future translation features
Tech Stack
Backend
- Framework: FastAPI 0.115.0
- OCR Engine: PaddleOCR 3.0+ with PaddleOCR-VL
- Database: MySQL via SQLAlchemy
- PDF Generation: Pandoc + WeasyPrint
- Image Processing: OpenCV, Pillow, pdf2image
- Office Conversion: LibreOffice (headless mode)
Frontend
- Framework: React 19 with TypeScript
- Build Tool: Vite 7
- Styling: Tailwind CSS v4 + shadcn/ui
- State Management: React Query + Zustand
- HTTP Client: Axios
Prerequisites
- OS: WSL2 Ubuntu 24.04
- Python: 3.12+
- Node.js: 24.x LTS
- MySQL: External database server (provided)
- GPU (Optional): NVIDIA GPU with CUDA 11.2+ for hardware acceleration
Quick Start
1. Automated Setup (Recommended)
# Run automated setup script
./setup_dev_env.sh
This script automatically:
- Detects NVIDIA GPU and CUDA version (if available)
- Installs Python development tools (pip, venv, build-essential)
- Installs system dependencies (pandoc, LibreOffice, fonts, etc.)
- Installs Node.js (via nvm)
- Installs PaddlePaddle GPU version (if GPU detected) or CPU version
- Installs other Python packages
- Installs frontend dependencies
- Verifies GPU functionality (if GPU detected)
2. Initialize Database
source venv/bin/activate
cd backend
alembic upgrade head
python create_test_user.py
cd ..
Default test user:
- Username:
admin - Password:
admin123
3. Start Development Servers
Backend (Terminal 1):
./start_backend.sh
Frontend (Terminal 2):
./start_frontend.sh
4. Access Application
- Frontend: http://localhost:5173
- API Docs: http://localhost:8000/docs
- Health Check: http://localhost:8000/health
Project Structure
Tool_OCR/
├── backend/ # FastAPI backend
│ ├── app/
│ │ ├── api/v1/ # API endpoints
│ │ ├── core/ # Configuration, database
│ │ ├── models/ # Database models
│ │ ├── services/ # Business logic
│ │ └── main.py # Application entry point
│ ├── alembic/ # Database migrations
│ └── tests/ # Test suite
├── frontend/ # React frontend
│ ├── src/
│ │ ├── components/ # UI components
│ │ ├── pages/ # Page components
│ │ ├── services/ # API services
│ │ └── stores/ # State management
│ └── public/ # Static assets
├── .env.local # Local development config
├── setup_dev_env.sh # Environment setup script
├── start_backend.sh # Backend startup script
└── start_frontend.sh # Frontend startup script
Configuration
Main config file: .env.local
# Database
MYSQL_HOST=mysql.theaken.com
MYSQL_PORT=33306
# Application ports
BACKEND_PORT=8000
FRONTEND_PORT=5173
# Token expiration (minutes)
ACCESS_TOKEN_EXPIRE_MINUTES=1440 # 24 hours
# Supported file formats
ALLOWED_EXTENSIONS=png,jpg,jpeg,pdf,bmp,tiff,doc,docx,ppt,pptx
# OCR settings
OCR_LANGUAGES=ch,en,japan,korean
MAX_OCR_WORKERS=4
# GPU acceleration (optional)
FORCE_CPU_MODE=false # Set to true to disable GPU even if available
GPU_MEMORY_FRACTION=0.8 # Fraction of GPU memory to use (0.0-1.0)
GPU_DEVICE_ID=0 # GPU device ID to use (0 for primary GPU)
GPU Acceleration
The system automatically detects and utilizes NVIDIA GPU hardware when available:
- Auto-detection: Setup script detects GPU and installs appropriate PaddlePaddle version
- Graceful fallback: If GPU is unavailable or fails, system automatically uses CPU mode
- Performance: GPU acceleration provides 3-10x speedup for OCR processing
- Configuration: Control GPU usage via
.env.localenvironment variables
Check GPU status at: http://localhost:8000/health
Known Limitations
Chart Recognition (PP-StructureV3)
Due to API incompatibility between PaddleOCR 3.x and PaddlePaddle 3.0.0 stable, the chart recognition feature is currently disabled:
- ✅ Works: Layout analysis detects and extracts charts/figures as image files
- ✅ Works: Tables, formulas, and text recognition function normally
- ❌ Disabled: Deep chart content understanding (chart type, data extraction, axis/legend parsing)
- ❌ Disabled: Converting chart content to structured data
Technical Details:
- The PaddleOCR-VL chart recognition model requires
paddle.incubate.nn.functional.fused_rms_norm_extAPI - PaddlePaddle 3.0.0 stable only provides the base
fused_rms_normfunction - This limitation will be resolved when PaddlePaddle releases an update with the extended API
Workaround: Charts are saved as images and can be viewed manually. For chart data extraction, consider using specialized chart recognition tools separately.
API Endpoints
Authentication
POST /api/v1/auth/login- User login
File Management
POST /api/v1/upload- Upload filesPOST /api/v1/ocr/process- Start OCR processingGET /api/v1/batch/{id}/status- Get batch status
Results & Export
GET /api/v1/ocr/result/{id}- Get OCR resultGET /api/v1/export/pdf/{id}- Export as PDF
Full API documentation: http://localhost:8000/docs
Supported File Formats
- Images: PNG, JPG, JPEG, BMP, TIFF
- Documents: PDF
- Office: DOC, DOCX, PPT, PPTX
Office files are automatically converted to PDF before OCR processing.
Development
Backend
source venv/bin/activate
cd backend
# Run tests
pytest
# Database migration
alembic revision --autogenerate -m "description"
alembic upgrade head
# Code formatting
black app/
Frontend
cd frontend
# Development server
npm run dev
# Build for production
npm run build
# Lint code
npm run lint
OpenSpec Workflow
This project follows OpenSpec for specification-driven development:
# View current changes
openspec list
# Validate specifications
openspec validate add-ocr-batch-processing
# View implementation tasks
cat openspec/changes/add-ocr-batch-processing/tasks.md
Roadmap
- Phase 0: Environment setup
- Phase 1: Core OCR backend (~98% complete)
- Phase 2: Frontend development (~92% complete)
- Phase 3: Testing & optimization
- Phase 4: Deployment automation
- Phase 5: Translation feature (future)
Documentation
- Development specs: openspec/project.md
- Implementation status: openspec/changes/add-ocr-batch-processing/STATUS.md
- Agent instructions: openspec/AGENTS.md
License
Internal project use
Notes
- First OCR run will download PaddleOCR models (~900MB)
- Token expiration is set to 24 hours by default
- Office conversion requires LibreOffice (installed via setup script)
- Development environment: WSL2 Ubuntu 24.04 with Python venv
- GPU acceleration: Automatically detected and enabled if NVIDIA GPU with CUDA 11.2+ is available
- WSL GPU support: Ensure NVIDIA CUDA drivers are installed in WSL for GPU acceleration
- GPU status can be checked via
/healthAPI endpoint