# Change: Migrate to External API Authentication ## Why The current local database authentication system has several limitations: - User credentials are managed locally, requiring manual user creation and password management - No centralized authentication with enterprise identity systems - Cannot leverage existing enterprise authentication infrastructure (e.g., Microsoft Azure AD) - No single sign-on (SSO) capability - Increased maintenance overhead for user management By migrating to the external API authentication service at https://pj-auth-api.vercel.app, the system will: - Integrate with enterprise Microsoft Azure AD authentication - Enable single sign-on (SSO) for users - Eliminate local password management - Leverage existing enterprise user management and security policies - Reduce maintenance overhead - Provide consistent authentication across multiple applications ## What Changes ### Authentication Flow - **Current**: Local database authentication using username/password stored in MySQL - **New**: External API authentication via POST to `https://pj-auth-api.vercel.app/api/auth/login` - **Token Management**: Use JWT tokens from external API instead of locally generated tokens - **User Display**: Use `name` field from API response for user display instead of local username ### API Integration **Endpoint**: `POST https://pj-auth-api.vercel.app/api/auth/login` **Request Format**: ```json { "username": "user@domain.com", "password": "user_password" } ``` **Success Response (200)**: ```json { "success": true, "message": "認證成功", "data": { "access_token": "eyJ0eXAiOiJKV1Q...", "id_token": "eyJ0eXAiOiJKV1Q...", "expires_in": 4999, "token_type": "Bearer", "userInfo": { "id": "42cf0b98-f598-47dd-ae2a-f33803f87d41", "name": "ymirliu 劉念萱", "email": "ymirliu@panjit.com.tw", "jobTitle": null, "officeLocation": "高雄", "businessPhones": ["1580"] }, "issuedAt": "2025-11-14T07:09:15.203Z", "expiresAt": "2025-11-14T08:32:34.203Z" }, "timestamp": "2025-11-14T07:09:15.203Z" } ``` **Failure Response (401)**: ```json { "success": false, "error": "用戶名或密碼錯誤", "code": "INVALID_CREDENTIALS", "timestamp": "2025-11-14T07:10:02.585Z" } ``` ### Database Schema Changes **Complete Redesign (No backward compatibility needed)**: **Table Prefix**: `tool_ocr_` (for clear separation from other systems in the same database) 1. **tool_ocr_users table (redesigned)**: ```sql CREATE TABLE tool_ocr_users ( id INT PRIMARY KEY AUTO_INCREMENT, email VARCHAR(255) UNIQUE NOT NULL, -- Primary identifier from Azure AD display_name VARCHAR(255), -- Display name from API response created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, last_login TIMESTAMP, is_active BOOLEAN DEFAULT TRUE ); ``` Note: No Azure AD ID storage needed - email is sufficient as unique identifier 2. **tool_ocr_tasks table (new - for task history)**: ```sql CREATE TABLE tool_ocr_tasks ( id INT PRIMARY KEY AUTO_INCREMENT, user_id INT NOT NULL, -- Foreign key to users table task_id VARCHAR(255) UNIQUE, -- Unique task identifier filename VARCHAR(255), file_type VARCHAR(50), status ENUM('pending', 'processing', 'completed', 'failed'), result_json_path VARCHAR(500), result_markdown_path VARCHAR(500), error_message TEXT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, completed_at TIMESTAMP NULL, file_deleted BOOLEAN DEFAULT FALSE, -- Track if files were auto-deleted FOREIGN KEY (user_id) REFERENCES tool_ocr_users(id), INDEX idx_user_status (user_id, status), INDEX idx_created (created_at) ); ``` 3. **tool_ocr_task_files table (for multiple files per task)**: ```sql CREATE TABLE tool_ocr_task_files ( id INT PRIMARY KEY AUTO_INCREMENT, task_id INT NOT NULL, original_name VARCHAR(255), stored_path VARCHAR(500), file_size BIGINT, mime_type VARCHAR(100), FOREIGN KEY (task_id) REFERENCES tool_ocr_tasks(id) ON DELETE CASCADE ); ``` 4. **tool_ocr_sessions table (for token management)**: ```sql CREATE TABLE tool_ocr_sessions ( id INT PRIMARY KEY AUTO_INCREMENT, user_id INT NOT NULL, access_token TEXT, id_token TEXT, expires_at TIMESTAMP, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, FOREIGN KEY (user_id) REFERENCES tool_ocr_users(id) ON DELETE CASCADE, INDEX idx_user (user_id), INDEX idx_expires (expires_at) ); ``` ### Session Management - Store external API tokens in session/cache instead of local JWT - Implement token refresh mechanism based on `expires_in` field - Use `expiresAt` timestamp for token expiration validation ## New Features: User Task Isolation and History ### Task Isolation - **Principle**: Each user can only see and access their own tasks - **Implementation**: All task queries filtered by `user_id` at API level - **Security**: Enforce user context validation in all task-related endpoints ### Task History Features 1. **Task Status Tracking**: - View pending tasks (waiting to process) - View processing tasks (currently running) - View completed tasks (with results available) - View failed tasks (with error messages) 2. **Historical Query Capabilities**: - Search tasks by filename - Filter by date range - Filter by status - Sort by creation/completion time - Pagination for large result sets 3. **Task Management**: - Download original files (if not auto-deleted) - Download results (JSON, Markdown, PDF exports) - Re-process failed tasks - Delete old tasks manually ### Frontend UI Changes 1. **New Components**: - Task History page/tab - Task filters and search bar - Task status badges - Batch action controls 2. **Task List View**: ``` | Filename | Status | Created | Completed | Actions | |----------|--------|---------|-----------|---------| | doc1.pdf | ✅ Completed | 2025-11-14 10:00 | 2025-11-14 10:05 | [Download] [View] | | doc2.pdf | 🔄 Processing | 2025-11-14 10:10 | - | [Cancel] | | doc3.pdf | ❌ Failed | 2025-11-14 09:00 | - | [Retry] [View Error] | ``` 3. **User Information Display**: - Show user display name in header - Show last login time - Show task statistics (total, completed, failed) ## Impact ### Affected Capabilities - `authentication`: Complete replacement of authentication mechanism - `user-management`: Simplified to read-only user information from external API - `session-management`: Modified to handle external tokens - `task-management`: NEW - User-specific task isolation and history - `file-access-control`: NEW - User-based file access restrictions ### Affected Code - **Backend Authentication**: - `backend/app/api/v1/endpoints/auth.py`: Replace login logic with external API call - `backend/app/core/security.py`: Modify token validation to use external tokens - `backend/app/core/auth.py`: Update authentication dependencies - `backend/app/services/auth_service.py`: New service for external API integration - **Database Models**: - `backend/app/models/user.py`: Complete redesign with new schema - `backend/app/models/task.py`: NEW - Task model with user association - `backend/app/models/task_file.py`: NEW - Task file model - `backend/alembic/versions/`: Complete database recreation - **Task Management APIs** (NEW): - `backend/app/api/v1/endpoints/tasks.py`: Task CRUD operations with user isolation - `backend/app/api/v1/endpoints/task_history.py`: Historical query endpoints - `backend/app/services/task_service.py`: Task business logic - `backend/app/services/file_access_service.py`: User-based file access control - **Frontend**: - `frontend/src/services/authService.ts`: Update to handle new token format - `frontend/src/stores/authStore.ts`: Modify to store/display user info from API - `frontend/src/components/Header.tsx`: Display `name` field and user menu - `frontend/src/pages/TaskHistory.tsx`: NEW - Task history page - `frontend/src/components/TaskList.tsx`: NEW - Task list component with filters - `frontend/src/components/TaskFilters.tsx`: NEW - Search and filter UI - `frontend/src/stores/taskStore.ts`: NEW - Task state management - `frontend/src/services/taskService.ts`: NEW - Task API client ### Dependencies - Add `httpx` or `aiohttp` for async HTTP requests to external API (already present) - No new package dependencies required ### Configuration - New environment variables: - `EXTERNAL_AUTH_API_URL` = "https://pj-auth-api.vercel.app" - `EXTERNAL_AUTH_ENDPOINT` = "/api/auth/login" - `EXTERNAL_AUTH_TIMEOUT` = 30 (seconds) - `TOKEN_REFRESH_BUFFER` = 300 (refresh tokens 5 minutes before expiry) - `TASK_RETENTION_DAYS` = 30 (auto-delete old tasks) - `MAX_TASKS_PER_USER` = 1000 (limit per user) - `ENABLE_TASK_HISTORY` = true (enable history feature) - `DATABASE_TABLE_PREFIX` = "tool_ocr_" (table naming prefix) ### Security Considerations - HTTPS required for all authentication requests - Token storage must be secure (HTTPOnly cookies or secure session storage) - Implement rate limiting for authentication attempts - Log all authentication events for audit trail - Validate SSL certificates for external API calls - Handle network failures gracefully with appropriate error messages - **User Isolation**: Enforce user context in all database queries - **File Access Control**: Validate user ownership before file access - **API Security**: Add user_id validation in all task-related endpoints ### Migration Plan (Simplified - No Rollback Needed) 1. **Phase 1**: Backup existing database (for reference only) 2. **Phase 2**: Drop old tables and create new schema 3. **Phase 3**: Deploy new authentication and task management system 4. **Phase 4**: Test with initial users 5. **Phase 5**: Full deployment Note: Since this is a test system with no production data to preserve, we can perform a clean migration without rollback concerns. ## Risks and Mitigations ### Risks 1. **External API Unavailability**: Authentication service downtime blocks all logins - *Mitigation*: Implement fallback to local auth, cache tokens, implement retry logic 2. **Token Expiration Handling**: Users may be logged out unexpectedly - *Mitigation*: Implement automatic token refresh before expiration 3. **Network Latency**: Slower authentication due to external API calls - *Mitigation*: Implement proper timeout handling, async requests, response caching 4. **Data Consistency**: User information mismatch between local DB and external system - *Mitigation*: Regular sync jobs, use external system as single source of truth 5. **Breaking Change**: Existing sessions will be invalidated - *Mitigation*: Provide migration window, clear communication to users ## Success Criteria - All users can authenticate via external API - Authentication response time < 2 seconds (95th percentile) - Zero data loss during migration - Automatic token refresh works without user intervention - Proper error messages for all failure scenarios - Audit logs capture all authentication events - Rollback procedure tested and documented