Files
OCR/backend/app/models/task.py
egg ad2b832fb6 feat: complete external auth V2 migration with advanced features
This commit implements comprehensive external Azure AD authentication
with complete task management, file download, and admin monitoring systems.

## Core Features Implemented (80% Complete)

### 1. Token Auto-Refresh Mechanism 
- Backend: POST /api/v2/auth/refresh endpoint
- Frontend: Auto-refresh 5 minutes before expiration
- Auto-retry on 401 errors with seamless token refresh

### 2. File Download System 
- Three format support: JSON / Markdown / PDF
- Endpoints: GET /api/v2/tasks/{id}/download/{format}
- File access control with ownership validation
- Frontend download buttons in TaskHistoryPage

### 3. Complete Task Management 
Backend Endpoints:
- POST /api/v2/tasks/{id}/start - Start task
- POST /api/v2/tasks/{id}/cancel - Cancel task
- POST /api/v2/tasks/{id}/retry - Retry failed task
- GET /api/v2/tasks - List with filters (status, filename, date range)
- GET /api/v2/tasks/stats - User statistics

Frontend Features:
- Status-based action buttons (Start/Cancel/Retry)
- Advanced search and filtering (status, filename, date range)
- Pagination and sorting
- Task statistics dashboard (5 stat cards)

### 4. Admin Monitoring System  (Backend)
Admin APIs:
- GET /api/v2/admin/stats - System statistics
- GET /api/v2/admin/users - User list with stats
- GET /api/v2/admin/users/top - User leaderboard
- GET /api/v2/admin/audit-logs - Audit log query system
- GET /api/v2/admin/audit-logs/user/{id}/summary

Admin Features:
- Email-based admin check (ymirliu@panjit.com.tw)
- Comprehensive system metrics (users, tasks, sessions, activity)
- Audit logging service for security tracking

### 5. User Isolation & Security 
- Row-level security on all task queries
- File access control with ownership validation
- Strict user_id filtering on all operations
- Session validation and expiry checking
- Admin privilege verification

## New Files Created

Backend:
- backend/app/models/user_v2.py - User model for external auth
- backend/app/models/task.py - Task model with user isolation
- backend/app/models/session.py - Session management
- backend/app/models/audit_log.py - Audit log model
- backend/app/services/external_auth_service.py - External API client
- backend/app/services/task_service.py - Task CRUD with isolation
- backend/app/services/file_access_service.py - File access control
- backend/app/services/admin_service.py - Admin operations
- backend/app/services/audit_service.py - Audit logging
- backend/app/routers/auth_v2.py - V2 auth endpoints
- backend/app/routers/tasks.py - Task management endpoints
- backend/app/routers/admin.py - Admin endpoints
- backend/alembic/versions/5e75a59fb763_*.py - DB migration

Frontend:
- frontend/src/services/apiV2.ts - Complete V2 API client
- frontend/src/types/apiV2.ts - V2 type definitions
- frontend/src/pages/TaskHistoryPage.tsx - Task history UI

Modified Files:
- backend/app/core/deps.py - Added get_current_admin_user_v2
- backend/app/main.py - Registered admin router
- frontend/src/pages/LoginPage.tsx - V2 login integration
- frontend/src/components/Layout.tsx - User display and logout
- frontend/src/App.tsx - Added /tasks route

## Documentation
- openspec/changes/.../PROGRESS_UPDATE.md - Detailed progress report

## Pending Items (20%)
1. Database migration execution for audit_logs table
2. Frontend admin dashboard page
3. Frontend audit log viewer

## Testing Status
- Manual testing:  Authentication flow verified
- Unit tests:  Pending
- Integration tests:  Pending

## Security Enhancements
-  User isolation (row-level security)
-  File access control
-  Token expiry validation
-  Admin privilege verification
-  Audit logging infrastructure
-  Token encryption (noted, low priority)
-  Rate limiting (noted, low priority)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-14 17:19:43 +08:00

127 lines
5.0 KiB
Python

"""
Tool_OCR - Task Model
OCR task management with user isolation
"""
from sqlalchemy import Column, Integer, String, DateTime, Boolean, Text, ForeignKey, Enum as SQLEnum
from sqlalchemy.orm import relationship
from datetime import datetime
import enum
from app.core.database import Base
class TaskStatus(str, enum.Enum):
"""Task status enumeration"""
PENDING = "pending"
PROCESSING = "processing"
COMPLETED = "completed"
FAILED = "failed"
class Task(Base):
"""
OCR Task model with user association
Each task belongs to a specific user and stores
processing status and result file paths.
"""
__tablename__ = "tool_ocr_tasks"
id = Column(Integer, primary_key=True, index=True, autoincrement=True)
user_id = Column(Integer, ForeignKey("tool_ocr_users.id", ondelete="CASCADE"),
nullable=False, index=True,
comment="Foreign key to users table")
task_id = Column(String(255), unique=True, nullable=False, index=True,
comment="Unique task identifier (UUID)")
filename = Column(String(255), nullable=True, index=True)
file_type = Column(String(50), nullable=True)
status = Column(SQLEnum(TaskStatus), default=TaskStatus.PENDING, nullable=False,
index=True)
result_json_path = Column(String(500), nullable=True,
comment="Path to JSON result file")
result_markdown_path = Column(String(500), nullable=True,
comment="Path to Markdown result file")
result_pdf_path = Column(String(500), nullable=True,
comment="Path to searchable PDF file")
error_message = Column(Text, nullable=True,
comment="Error details if task failed")
processing_time_ms = Column(Integer, nullable=True,
comment="Processing time in milliseconds")
created_at = Column(DateTime, default=datetime.utcnow, nullable=False, index=True)
updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow,
nullable=False)
completed_at = Column(DateTime, nullable=True)
file_deleted = Column(Boolean, default=False, nullable=False,
comment="Track if files were auto-deleted")
# Relationships
user = relationship("User", back_populates="tasks")
files = relationship("TaskFile", back_populates="task", cascade="all, delete-orphan")
def __repr__(self):
return f"<Task(id={self.id}, task_id='{self.task_id}', status='{self.status.value}')>"
def to_dict(self):
"""Convert task to dictionary"""
return {
"id": self.id,
"task_id": self.task_id,
"filename": self.filename,
"file_type": self.file_type,
"status": self.status.value if self.status else None,
"result_json_path": self.result_json_path,
"result_markdown_path": self.result_markdown_path,
"result_pdf_path": self.result_pdf_path,
"error_message": self.error_message,
"processing_time_ms": self.processing_time_ms,
"created_at": self.created_at.isoformat() if self.created_at else None,
"updated_at": self.updated_at.isoformat() if self.updated_at else None,
"completed_at": self.completed_at.isoformat() if self.completed_at else None,
"file_deleted": self.file_deleted
}
class TaskFile(Base):
"""
Task file model
Stores information about files associated with a task.
"""
__tablename__ = "tool_ocr_task_files"
id = Column(Integer, primary_key=True, index=True, autoincrement=True)
task_id = Column(Integer, ForeignKey("tool_ocr_tasks.id", ondelete="CASCADE"),
nullable=False, index=True,
comment="Foreign key to tasks table")
original_name = Column(String(255), nullable=True)
stored_path = Column(String(500), nullable=True,
comment="Actual file path on server")
file_size = Column(Integer, nullable=True,
comment="File size in bytes")
mime_type = Column(String(100), nullable=True)
file_hash = Column(String(64), nullable=True, index=True,
comment="SHA256 hash for deduplication")
created_at = Column(DateTime, default=datetime.utcnow, nullable=False)
# Relationships
task = relationship("Task", back_populates="files")
def __repr__(self):
return f"<TaskFile(id={self.id}, task_id={self.task_id}, original_name='{self.original_name}')>"
def to_dict(self):
"""Convert task file to dictionary"""
return {
"id": self.id,
"task_id": self.task_id,
"original_name": self.original_name,
"stored_path": self.stored_path,
"file_size": self.file_size,
"mime_type": self.mime_type,
"file_hash": self.file_hash,
"created_at": self.created_at.isoformat() if self.created_at else None
}