feat: complete external auth V2 migration with advanced features

This commit implements comprehensive external Azure AD authentication
with complete task management, file download, and admin monitoring systems.

## Core Features Implemented (80% Complete)

### 1. Token Auto-Refresh Mechanism 
- Backend: POST /api/v2/auth/refresh endpoint
- Frontend: Auto-refresh 5 minutes before expiration
- Auto-retry on 401 errors with seamless token refresh

### 2. File Download System 
- Three format support: JSON / Markdown / PDF
- Endpoints: GET /api/v2/tasks/{id}/download/{format}
- File access control with ownership validation
- Frontend download buttons in TaskHistoryPage

### 3. Complete Task Management 
Backend Endpoints:
- POST /api/v2/tasks/{id}/start - Start task
- POST /api/v2/tasks/{id}/cancel - Cancel task
- POST /api/v2/tasks/{id}/retry - Retry failed task
- GET /api/v2/tasks - List with filters (status, filename, date range)
- GET /api/v2/tasks/stats - User statistics

Frontend Features:
- Status-based action buttons (Start/Cancel/Retry)
- Advanced search and filtering (status, filename, date range)
- Pagination and sorting
- Task statistics dashboard (5 stat cards)

### 4. Admin Monitoring System  (Backend)
Admin APIs:
- GET /api/v2/admin/stats - System statistics
- GET /api/v2/admin/users - User list with stats
- GET /api/v2/admin/users/top - User leaderboard
- GET /api/v2/admin/audit-logs - Audit log query system
- GET /api/v2/admin/audit-logs/user/{id}/summary

Admin Features:
- Email-based admin check (ymirliu@panjit.com.tw)
- Comprehensive system metrics (users, tasks, sessions, activity)
- Audit logging service for security tracking

### 5. User Isolation & Security 
- Row-level security on all task queries
- File access control with ownership validation
- Strict user_id filtering on all operations
- Session validation and expiry checking
- Admin privilege verification

## New Files Created

Backend:
- backend/app/models/user_v2.py - User model for external auth
- backend/app/models/task.py - Task model with user isolation
- backend/app/models/session.py - Session management
- backend/app/models/audit_log.py - Audit log model
- backend/app/services/external_auth_service.py - External API client
- backend/app/services/task_service.py - Task CRUD with isolation
- backend/app/services/file_access_service.py - File access control
- backend/app/services/admin_service.py - Admin operations
- backend/app/services/audit_service.py - Audit logging
- backend/app/routers/auth_v2.py - V2 auth endpoints
- backend/app/routers/tasks.py - Task management endpoints
- backend/app/routers/admin.py - Admin endpoints
- backend/alembic/versions/5e75a59fb763_*.py - DB migration

Frontend:
- frontend/src/services/apiV2.ts - Complete V2 API client
- frontend/src/types/apiV2.ts - V2 type definitions
- frontend/src/pages/TaskHistoryPage.tsx - Task history UI

Modified Files:
- backend/app/core/deps.py - Added get_current_admin_user_v2
- backend/app/main.py - Registered admin router
- frontend/src/pages/LoginPage.tsx - V2 login integration
- frontend/src/components/Layout.tsx - User display and logout
- frontend/src/App.tsx - Added /tasks route

## Documentation
- openspec/changes/.../PROGRESS_UPDATE.md - Detailed progress report

## Pending Items (20%)
1. Database migration execution for audit_logs table
2. Frontend admin dashboard page
3. Frontend audit log viewer

## Testing Status
- Manual testing:  Authentication flow verified
- Unit tests:  Pending
- Integration tests:  Pending

## Security Enhancements
-  User isolation (row-level security)
-  File access control
-  Token expiry validation
-  Admin privilege verification
-  Audit logging infrastructure
-  Token encryption (noted, low priority)
-  Rate limiting (noted, low priority)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
egg
2025-11-14 17:19:43 +08:00
parent 470fa96428
commit ad2b832fb6
32 changed files with 6450 additions and 26 deletions

View File

@@ -0,0 +1,563 @@
"""
Tool_OCR - Task Management Router
Handles OCR task operations with user isolation
"""
import logging
from typing import Optional
from fastapi import APIRouter, Depends, HTTPException, status, Query
from fastapi.responses import FileResponse
from sqlalchemy.orm import Session
from app.core.deps import get_db, get_current_user_v2
from app.models.user_v2 import User
from app.models.task import TaskStatus
from app.schemas.task import (
TaskCreate,
TaskUpdate,
TaskResponse,
TaskDetailResponse,
TaskListResponse,
TaskStatsResponse,
TaskStatusEnum,
)
from app.services.task_service import task_service
from app.services.file_access_service import file_access_service
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/api/v2/tasks", tags=["Tasks"])
@router.post("/", response_model=TaskResponse, status_code=status.HTTP_201_CREATED)
async def create_task(
task_data: TaskCreate,
db: Session = Depends(get_db),
current_user: User = Depends(get_current_user_v2)
):
"""
Create a new OCR task
- **filename**: Original filename (optional)
- **file_type**: File MIME type (optional)
"""
try:
task = task_service.create_task(
db=db,
user_id=current_user.id,
filename=task_data.filename,
file_type=task_data.file_type
)
logger.info(f"Created task {task.task_id} for user {current_user.email}")
return task
except Exception as e:
logger.exception(f"Failed to create task for user {current_user.id}")
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=f"Failed to create task: {str(e)}"
)
@router.get("/", response_model=TaskListResponse)
async def list_tasks(
status_filter: Optional[TaskStatusEnum] = Query(None, alias="status"),
filename_search: Optional[str] = Query(None, alias="filename"),
date_from: Optional[str] = Query(None, alias="date_from"),
date_to: Optional[str] = Query(None, alias="date_to"),
page: int = Query(1, ge=1),
page_size: int = Query(50, ge=1, le=100),
order_by: str = Query("created_at"),
order_desc: bool = Query(True),
db: Session = Depends(get_db),
current_user: User = Depends(get_current_user_v2)
):
"""
List user's tasks with pagination and filtering
- **status**: Filter by task status (optional)
- **filename**: Search by filename (partial match, optional)
- **date_from**: Filter tasks from this date (YYYY-MM-DD, optional)
- **date_to**: Filter tasks until this date (YYYY-MM-DD, optional)
- **page**: Page number (starts from 1)
- **page_size**: Number of tasks per page (max 100)
- **order_by**: Sort field (created_at, updated_at, completed_at)
- **order_desc**: Sort descending (default: true)
"""
try:
# Convert enum to model enum if provided
status_enum = TaskStatus[status_filter.value.upper()] if status_filter else None
# Parse date strings
from datetime import datetime
date_from_dt = datetime.fromisoformat(date_from) if date_from else None
date_to_dt = datetime.fromisoformat(date_to) if date_to else None
# Calculate offset
skip = (page - 1) * page_size
# Get tasks
tasks, total = task_service.get_user_tasks(
db=db,
user_id=current_user.id,
status=status_enum,
filename_search=filename_search,
date_from=date_from_dt,
date_to=date_to_dt,
skip=skip,
limit=page_size,
order_by=order_by,
order_desc=order_desc
)
# Calculate pagination
has_more = (skip + len(tasks)) < total
return {
"tasks": tasks,
"total": total,
"page": page,
"page_size": page_size,
"has_more": has_more
}
except Exception as e:
logger.exception(f"Failed to list tasks for user {current_user.id}")
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=f"Failed to list tasks: {str(e)}"
)
@router.get("/stats", response_model=TaskStatsResponse)
async def get_task_stats(
db: Session = Depends(get_db),
current_user: User = Depends(get_current_user_v2)
):
"""
Get task statistics for current user
Returns counts by status
"""
try:
stats = task_service.get_user_stats(db=db, user_id=current_user.id)
return stats
except Exception as e:
logger.exception(f"Failed to get stats for user {current_user.id}")
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=f"Failed to get statistics: {str(e)}"
)
@router.get("/{task_id}", response_model=TaskDetailResponse)
async def get_task(
task_id: str,
db: Session = Depends(get_db),
current_user: User = Depends(get_current_user_v2)
):
"""
Get task details by ID
- **task_id**: Task UUID
"""
task = task_service.get_task_by_id(
db=db,
task_id=task_id,
user_id=current_user.id
)
if not task:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="Task not found"
)
return task
@router.patch("/{task_id}", response_model=TaskResponse)
async def update_task(
task_id: str,
task_update: TaskUpdate,
db: Session = Depends(get_db),
current_user: User = Depends(get_current_user_v2)
):
"""
Update task status and results
- **task_id**: Task UUID
- **status**: New task status (optional)
- **error_message**: Error message if failed (optional)
- **processing_time_ms**: Processing time in milliseconds (optional)
- **result_json_path**: Path to JSON result (optional)
- **result_markdown_path**: Path to Markdown result (optional)
- **result_pdf_path**: Path to searchable PDF (optional)
"""
try:
# Update status if provided
if task_update.status:
status_enum = TaskStatus[task_update.status.value.upper()]
task = task_service.update_task_status(
db=db,
task_id=task_id,
user_id=current_user.id,
status=status_enum,
error_message=task_update.error_message,
processing_time_ms=task_update.processing_time_ms
)
if not task:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="Task not found"
)
# Update result paths if provided
if any([
task_update.result_json_path,
task_update.result_markdown_path,
task_update.result_pdf_path
]):
task = task_service.update_task_results(
db=db,
task_id=task_id,
user_id=current_user.id,
result_json_path=task_update.result_json_path,
result_markdown_path=task_update.result_markdown_path,
result_pdf_path=task_update.result_pdf_path
)
if not task:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="Task not found"
)
return task
except HTTPException:
raise
except Exception as e:
logger.exception(f"Failed to update task {task_id}")
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=f"Failed to update task: {str(e)}"
)
@router.delete("/{task_id}", status_code=status.HTTP_204_NO_CONTENT)
async def delete_task(
task_id: str,
db: Session = Depends(get_db),
current_user: User = Depends(get_current_user_v2)
):
"""
Delete a task
- **task_id**: Task UUID
"""
success = task_service.delete_task(
db=db,
task_id=task_id,
user_id=current_user.id
)
if not success:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="Task not found"
)
logger.info(f"Deleted task {task_id} for user {current_user.email}")
return None
@router.get("/{task_id}/download/json", summary="Download JSON result")
async def download_json(
task_id: str,
db: Session = Depends(get_db),
current_user: User = Depends(get_current_user_v2)
):
"""
Download task result as JSON file
- **task_id**: Task UUID
"""
# Get task
task = task_service.get_task_by_id(
db=db,
task_id=task_id,
user_id=current_user.id
)
if not task:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="Task not found"
)
# Validate file access
is_valid, error_msg = file_access_service.validate_file_access(
db=db,
user_id=current_user.id,
task_id=task_id,
file_path=task.result_json_path
)
if not is_valid:
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail=error_msg
)
# Return file
filename = f"{task.filename or task_id}_result.json"
return FileResponse(
path=task.result_json_path,
filename=filename,
media_type="application/json"
)
@router.get("/{task_id}/download/markdown", summary="Download Markdown result")
async def download_markdown(
task_id: str,
db: Session = Depends(get_db),
current_user: User = Depends(get_current_user_v2)
):
"""
Download task result as Markdown file
- **task_id**: Task UUID
"""
# Get task
task = task_service.get_task_by_id(
db=db,
task_id=task_id,
user_id=current_user.id
)
if not task:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="Task not found"
)
# Validate file access
is_valid, error_msg = file_access_service.validate_file_access(
db=db,
user_id=current_user.id,
task_id=task_id,
file_path=task.result_markdown_path
)
if not is_valid:
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail=error_msg
)
# Return file
filename = f"{task.filename or task_id}_result.md"
return FileResponse(
path=task.result_markdown_path,
filename=filename,
media_type="text/markdown"
)
@router.get("/{task_id}/download/pdf", summary="Download PDF result")
async def download_pdf(
task_id: str,
db: Session = Depends(get_db),
current_user: User = Depends(get_current_user_v2)
):
"""
Download task result as searchable PDF file
- **task_id**: Task UUID
"""
# Get task
task = task_service.get_task_by_id(
db=db,
task_id=task_id,
user_id=current_user.id
)
if not task:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="Task not found"
)
# Validate file access
is_valid, error_msg = file_access_service.validate_file_access(
db=db,
user_id=current_user.id,
task_id=task_id,
file_path=task.result_pdf_path
)
if not is_valid:
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail=error_msg
)
# Return file
filename = f"{task.filename or task_id}_result.pdf"
return FileResponse(
path=task.result_pdf_path,
filename=filename,
media_type="application/pdf"
)
@router.post("/{task_id}/start", response_model=TaskResponse, summary="Start task processing")
async def start_task(
task_id: str,
db: Session = Depends(get_db),
current_user: User = Depends(get_current_user_v2)
):
"""
Start processing a pending task
- **task_id**: Task UUID
"""
try:
task = task_service.update_task_status(
db=db,
task_id=task_id,
user_id=current_user.id,
status=TaskStatus.PROCESSING
)
if not task:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="Task not found"
)
logger.info(f"Started task {task_id} for user {current_user.email}")
return task
except HTTPException:
raise
except Exception as e:
logger.exception(f"Failed to start task {task_id}")
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=f"Failed to start task: {str(e)}"
)
@router.post("/{task_id}/cancel", response_model=TaskResponse, summary="Cancel task")
async def cancel_task(
task_id: str,
db: Session = Depends(get_db),
current_user: User = Depends(get_current_user_v2)
):
"""
Cancel a pending or processing task
- **task_id**: Task UUID
"""
try:
# Get current task
task = task_service.get_task_by_id(
db=db,
task_id=task_id,
user_id=current_user.id
)
if not task:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="Task not found"
)
# Only allow canceling pending or processing tasks
if task.status not in [TaskStatus.PENDING, TaskStatus.PROCESSING]:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail=f"Cannot cancel task in '{task.status.value}' status"
)
# Update to failed status with cancellation message
task = task_service.update_task_status(
db=db,
task_id=task_id,
user_id=current_user.id,
status=TaskStatus.FAILED,
error_message="Task cancelled by user"
)
logger.info(f"Cancelled task {task_id} for user {current_user.email}")
return task
except HTTPException:
raise
except Exception as e:
logger.exception(f"Failed to cancel task {task_id}")
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=f"Failed to cancel task: {str(e)}"
)
@router.post("/{task_id}/retry", response_model=TaskResponse, summary="Retry failed task")
async def retry_task(
task_id: str,
db: Session = Depends(get_db),
current_user: User = Depends(get_current_user_v2)
):
"""
Retry a failed task
- **task_id**: Task UUID
"""
try:
# Get current task
task = task_service.get_task_by_id(
db=db,
task_id=task_id,
user_id=current_user.id
)
if not task:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="Task not found"
)
# Only allow retrying failed tasks
if task.status != TaskStatus.FAILED:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail=f"Cannot retry task in '{task.status.value}' status"
)
# Reset task to pending status
task = task_service.update_task_status(
db=db,
task_id=task_id,
user_id=current_user.id,
status=TaskStatus.PENDING,
error_message=None
)
logger.info(f"Retrying task {task_id} for user {current_user.email}")
return task
except HTTPException:
raise
except Exception as e:
logger.exception(f"Failed to retry task {task_id}")
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=f"Failed to retry task: {str(e)}"
)