egg ad2b832fb6 feat: complete external auth V2 migration with advanced features
This commit implements comprehensive external Azure AD authentication
with complete task management, file download, and admin monitoring systems.

## Core Features Implemented (80% Complete)

### 1. Token Auto-Refresh Mechanism 
- Backend: POST /api/v2/auth/refresh endpoint
- Frontend: Auto-refresh 5 minutes before expiration
- Auto-retry on 401 errors with seamless token refresh

### 2. File Download System 
- Three format support: JSON / Markdown / PDF
- Endpoints: GET /api/v2/tasks/{id}/download/{format}
- File access control with ownership validation
- Frontend download buttons in TaskHistoryPage

### 3. Complete Task Management 
Backend Endpoints:
- POST /api/v2/tasks/{id}/start - Start task
- POST /api/v2/tasks/{id}/cancel - Cancel task
- POST /api/v2/tasks/{id}/retry - Retry failed task
- GET /api/v2/tasks - List with filters (status, filename, date range)
- GET /api/v2/tasks/stats - User statistics

Frontend Features:
- Status-based action buttons (Start/Cancel/Retry)
- Advanced search and filtering (status, filename, date range)
- Pagination and sorting
- Task statistics dashboard (5 stat cards)

### 4. Admin Monitoring System  (Backend)
Admin APIs:
- GET /api/v2/admin/stats - System statistics
- GET /api/v2/admin/users - User list with stats
- GET /api/v2/admin/users/top - User leaderboard
- GET /api/v2/admin/audit-logs - Audit log query system
- GET /api/v2/admin/audit-logs/user/{id}/summary

Admin Features:
- Email-based admin check (ymirliu@panjit.com.tw)
- Comprehensive system metrics (users, tasks, sessions, activity)
- Audit logging service for security tracking

### 5. User Isolation & Security 
- Row-level security on all task queries
- File access control with ownership validation
- Strict user_id filtering on all operations
- Session validation and expiry checking
- Admin privilege verification

## New Files Created

Backend:
- backend/app/models/user_v2.py - User model for external auth
- backend/app/models/task.py - Task model with user isolation
- backend/app/models/session.py - Session management
- backend/app/models/audit_log.py - Audit log model
- backend/app/services/external_auth_service.py - External API client
- backend/app/services/task_service.py - Task CRUD with isolation
- backend/app/services/file_access_service.py - File access control
- backend/app/services/admin_service.py - Admin operations
- backend/app/services/audit_service.py - Audit logging
- backend/app/routers/auth_v2.py - V2 auth endpoints
- backend/app/routers/tasks.py - Task management endpoints
- backend/app/routers/admin.py - Admin endpoints
- backend/alembic/versions/5e75a59fb763_*.py - DB migration

Frontend:
- frontend/src/services/apiV2.ts - Complete V2 API client
- frontend/src/types/apiV2.ts - V2 type definitions
- frontend/src/pages/TaskHistoryPage.tsx - Task history UI

Modified Files:
- backend/app/core/deps.py - Added get_current_admin_user_v2
- backend/app/main.py - Registered admin router
- frontend/src/pages/LoginPage.tsx - V2 login integration
- frontend/src/components/Layout.tsx - User display and logout
- frontend/src/App.tsx - Added /tasks route

## Documentation
- openspec/changes/.../PROGRESS_UPDATE.md - Detailed progress report

## Pending Items (20%)
1. Database migration execution for audit_logs table
2. Frontend admin dashboard page
3. Frontend audit log viewer

## Testing Status
- Manual testing:  Authentication flow verified
- Unit tests:  Pending
- Integration tests:  Pending

## Security Enhancements
-  User isolation (row-level security)
-  File access control
-  Token expiry validation
-  Admin privilege verification
-  Audit logging infrastructure
-  Token encryption (noted, low priority)
-  Rate limiting (noted, low priority)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-14 17:19:43 +08:00
2025-11-12 22:53:17 +08:00
2025-11-12 22:53:17 +08:00
2025-11-12 22:53:17 +08:00
2nd
2025-11-12 22:54:56 +08:00
2025-11-12 22:53:17 +08:00
2025-11-12 22:53:17 +08:00

Tool_OCR

OCR Batch Processing System with Structure Extraction

A web-based solution to extract text, images, and document structure from multiple files efficiently using PaddleOCR-VL.

Features

  • 🔍 Multi-Language OCR: Support for 109 languages (Chinese, English, Japanese, Korean, etc.)
  • 📄 Document Structure Analysis: Intelligent layout analysis with PP-StructureV3
  • 🖼️ Image Extraction: Preserve document images alongside text content
  • 📑 Batch Processing: Process multiple files concurrently with progress tracking
  • 📤 Multiple Export Formats: TXT, JSON, Excel, Markdown with images, searchable PDF
  • 📋 Office Documents: DOC, DOCX, PPT, PPTX support via LibreOffice conversion
  • 🚀 GPU Acceleration: Automatic CUDA GPU detection with graceful CPU fallback
  • 🔧 Flexible Configuration: Rule-based output formatting
  • 🌐 Translation Ready: Reserved architecture for future translation features

Tech Stack

Backend

  • Framework: FastAPI 0.115.0
  • OCR Engine: PaddleOCR 3.0+ with PaddleOCR-VL
  • Database: MySQL via SQLAlchemy
  • PDF Generation: Pandoc + WeasyPrint
  • Image Processing: OpenCV, Pillow, pdf2image
  • Office Conversion: LibreOffice (headless mode)

Frontend

  • Framework: React 19 with TypeScript
  • Build Tool: Vite 7
  • Styling: Tailwind CSS v4 + shadcn/ui
  • State Management: React Query + Zustand
  • HTTP Client: Axios

Prerequisites

  • OS: WSL2 Ubuntu 24.04
  • Python: 3.12+
  • Node.js: 24.x LTS
  • MySQL: External database server (provided)
  • GPU (Optional): NVIDIA GPU with CUDA 11.2+ for hardware acceleration

Quick Start

# Run automated setup script
./setup_dev_env.sh

This script automatically:

  • Detects NVIDIA GPU and CUDA version (if available)
  • Installs Python development tools (pip, venv, build-essential)
  • Installs system dependencies (pandoc, LibreOffice, fonts, etc.)
  • Installs Node.js (via nvm)
  • Installs PaddlePaddle GPU version (if GPU detected) or CPU version
  • Installs other Python packages
  • Installs frontend dependencies
  • Verifies GPU functionality (if GPU detected)

2. Initialize Database

source venv/bin/activate
cd backend
alembic upgrade head
python create_test_user.py
cd ..

Default test user:

  • Username: admin
  • Password: admin123

3. Start Development Servers

Backend (Terminal 1):

./start_backend.sh

Frontend (Terminal 2):

./start_frontend.sh

4. Access Application

Project Structure

Tool_OCR/
├── backend/                 # FastAPI backend
│   ├── app/
│   │   ├── api/v1/         # API endpoints
│   │   ├── core/           # Configuration, database
│   │   ├── models/         # Database models
│   │   ├── services/       # Business logic
│   │   └── main.py         # Application entry point
│   ├── alembic/            # Database migrations
│   └── tests/              # Test suite
├── frontend/               # React frontend
│   ├── src/
│   │   ├── components/     # UI components
│   │   ├── pages/          # Page components
│   │   ├── services/       # API services
│   │   └── stores/         # State management
│   └── public/             # Static assets
├── .env.local              # Local development config
├── setup_dev_env.sh        # Environment setup script
├── start_backend.sh        # Backend startup script
└── start_frontend.sh       # Frontend startup script

Configuration

Main config file: .env.local

# Database
MYSQL_HOST=mysql.theaken.com
MYSQL_PORT=33306

# Application ports
BACKEND_PORT=8000
FRONTEND_PORT=5173

# Token expiration (minutes)
ACCESS_TOKEN_EXPIRE_MINUTES=1440  # 24 hours

# Supported file formats
ALLOWED_EXTENSIONS=png,jpg,jpeg,pdf,bmp,tiff,doc,docx,ppt,pptx

# OCR settings
OCR_LANGUAGES=ch,en,japan,korean
MAX_OCR_WORKERS=4

# GPU acceleration (optional)
FORCE_CPU_MODE=false         # Set to true to disable GPU even if available
GPU_MEMORY_FRACTION=0.8      # Fraction of GPU memory to use (0.0-1.0)
GPU_DEVICE_ID=0              # GPU device ID to use (0 for primary GPU)

GPU Acceleration

The system automatically detects and utilizes NVIDIA GPU hardware when available:

  • Auto-detection: Setup script detects GPU and installs appropriate PaddlePaddle version
  • Graceful fallback: If GPU is unavailable or fails, system automatically uses CPU mode
  • Performance: GPU acceleration provides 3-10x speedup for OCR processing
  • Configuration: Control GPU usage via .env.local environment variables

Check GPU status at: http://localhost:8000/health

Known Limitations

Chart Recognition (PP-StructureV3)

Due to API incompatibility between PaddleOCR 3.x and PaddlePaddle 3.0.0 stable, the chart recognition feature is currently disabled:

  • Works: Layout analysis detects and extracts charts/figures as image files
  • Works: Tables, formulas, and text recognition function normally
  • Disabled: Deep chart content understanding (chart type, data extraction, axis/legend parsing)
  • Disabled: Converting chart content to structured data

Technical Details:

  • The PaddleOCR-VL chart recognition model requires paddle.incubate.nn.functional.fused_rms_norm_ext API
  • PaddlePaddle 3.0.0 stable only provides the base fused_rms_norm function
  • This limitation will be resolved when PaddlePaddle releases an update with the extended API

Workaround: Charts are saved as images and can be viewed manually. For chart data extraction, consider using specialized chart recognition tools separately.

API Endpoints

Authentication

  • POST /api/v1/auth/login - User login

File Management

  • POST /api/v1/upload - Upload files
  • POST /api/v1/ocr/process - Start OCR processing
  • GET /api/v1/batch/{id}/status - Get batch status

Results & Export

  • GET /api/v1/ocr/result/{id} - Get OCR result
  • GET /api/v1/export/pdf/{id} - Export as PDF

Full API documentation: http://localhost:8000/docs

Supported File Formats

  • Images: PNG, JPG, JPEG, BMP, TIFF
  • Documents: PDF
  • Office: DOC, DOCX, PPT, PPTX

Office files are automatically converted to PDF before OCR processing.

Development

Backend

source venv/bin/activate
cd backend

# Run tests
pytest

# Database migration
alembic revision --autogenerate -m "description"
alembic upgrade head

# Code formatting
black app/

Frontend

cd frontend

# Development server
npm run dev

# Build for production
npm run build

# Lint code
npm run lint

OpenSpec Workflow

This project follows OpenSpec for specification-driven development:

# View current changes
openspec list

# Validate specifications
openspec validate add-ocr-batch-processing

# View implementation tasks
cat openspec/changes/add-ocr-batch-processing/tasks.md

Roadmap

  • Phase 0: Environment setup
  • Phase 1: Core OCR backend (~98% complete)
  • Phase 2: Frontend development (~92% complete)
  • Phase 3: Testing & optimization
  • Phase 4: Deployment automation
  • Phase 5: Translation feature (future)

Documentation

License

Internal project use

Notes

  • First OCR run will download PaddleOCR models (~900MB)
  • Token expiration is set to 24 hours by default
  • Office conversion requires LibreOffice (installed via setup script)
  • Development environment: WSL2 Ubuntu 24.04 with Python venv
  • GPU acceleration: Automatically detected and enabled if NVIDIA GPU with CUDA 11.2+ is available
  • WSL GPU support: Ensure NVIDIA CUDA drivers are installed in WSL for GPU acceleration
  • GPU status can be checked via /health API endpoint
Description
No description provided
Readme 20 MiB
Languages
Python 84.1%
TypeScript 14.1%
Shell 1.4%
CSS 0.3%