Files
OCR/openspec/changes/migrate-to-external-api-authentication/proposal.md
egg 470fa96428 feat: add database table prefix and complete schema definition
Added `tool_ocr_` prefix to all database tables for clear separation
from other systems in the same database.

Changes:
- All tables now use `tool_ocr_` prefix
- Added tool_ocr_sessions table for token management
- Created complete SQL schema file with:
  - Full table definitions with comments
  - Indexes for performance
  - Views for common queries
  - Stored procedures for maintenance
  - Audit log table (optional)

New files:
- database_schema.sql: Ready-to-use SQL script for deployment

Configuration:
- Added DATABASE_TABLE_PREFIX environment variable
- Updated all references to use prefixed table names

Benefits:
- Clear namespace separation in shared databases
- Easier identification of Tool_OCR tables
- Prevent conflicts with other applications

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-14 15:40:24 +08:00

11 KiB

Change: Migrate to External API Authentication

Why

The current local database authentication system has several limitations:

  • User credentials are managed locally, requiring manual user creation and password management
  • No centralized authentication with enterprise identity systems
  • Cannot leverage existing enterprise authentication infrastructure (e.g., Microsoft Azure AD)
  • No single sign-on (SSO) capability
  • Increased maintenance overhead for user management

By migrating to the external API authentication service at https://pj-auth-api.vercel.app, the system will:

  • Integrate with enterprise Microsoft Azure AD authentication
  • Enable single sign-on (SSO) for users
  • Eliminate local password management
  • Leverage existing enterprise user management and security policies
  • Reduce maintenance overhead
  • Provide consistent authentication across multiple applications

What Changes

Authentication Flow

  • Current: Local database authentication using username/password stored in MySQL
  • New: External API authentication via POST to https://pj-auth-api.vercel.app/api/auth/login
  • Token Management: Use JWT tokens from external API instead of locally generated tokens
  • User Display: Use name field from API response for user display instead of local username

API Integration

Endpoint: POST https://pj-auth-api.vercel.app/api/auth/login

Request Format:

{
  "username": "user@domain.com",
  "password": "user_password"
}

Success Response (200):

{
  "success": true,
  "message": "認證成功",
  "data": {
    "access_token": "eyJ0eXAiOiJKV1Q...",
    "id_token": "eyJ0eXAiOiJKV1Q...",
    "expires_in": 4999,
    "token_type": "Bearer",
    "userInfo": {
      "id": "42cf0b98-f598-47dd-ae2a-f33803f87d41",
      "name": "ymirliu 劉念萱",
      "email": "ymirliu@panjit.com.tw",
      "jobTitle": null,
      "officeLocation": "高雄",
      "businessPhones": ["1580"]
    },
    "issuedAt": "2025-11-14T07:09:15.203Z",
    "expiresAt": "2025-11-14T08:32:34.203Z"
  },
  "timestamp": "2025-11-14T07:09:15.203Z"
}

Failure Response (401):

{
  "success": false,
  "error": "用戶名或密碼錯誤",
  "code": "INVALID_CREDENTIALS",
  "timestamp": "2025-11-14T07:10:02.585Z"
}

Database Schema Changes

Complete Redesign (No backward compatibility needed):

Table Prefix: tool_ocr_ (for clear separation from other systems in the same database)

  1. tool_ocr_users table (redesigned):

    CREATE TABLE tool_ocr_users (
      id INT PRIMARY KEY AUTO_INCREMENT,
      email VARCHAR(255) UNIQUE NOT NULL,  -- Primary identifier from Azure AD
      display_name VARCHAR(255),            -- Display name from API response
      created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
      last_login TIMESTAMP,
      is_active BOOLEAN DEFAULT TRUE
    );
    

    Note: No Azure AD ID storage needed - email is sufficient as unique identifier

  2. tool_ocr_tasks table (new - for task history):

    CREATE TABLE tool_ocr_tasks (
      id INT PRIMARY KEY AUTO_INCREMENT,
      user_id INT NOT NULL,                 -- Foreign key to users table
      task_id VARCHAR(255) UNIQUE,          -- Unique task identifier
      filename VARCHAR(255),
      file_type VARCHAR(50),
      status ENUM('pending', 'processing', 'completed', 'failed'),
      result_json_path VARCHAR(500),
      result_markdown_path VARCHAR(500),
      error_message TEXT,
      created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
      updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
      completed_at TIMESTAMP NULL,
      file_deleted BOOLEAN DEFAULT FALSE,   -- Track if files were auto-deleted
      FOREIGN KEY (user_id) REFERENCES tool_ocr_users(id),
      INDEX idx_user_status (user_id, status),
      INDEX idx_created (created_at)
    );
    
  3. tool_ocr_task_files table (for multiple files per task):

    CREATE TABLE tool_ocr_task_files (
      id INT PRIMARY KEY AUTO_INCREMENT,
      task_id INT NOT NULL,
      original_name VARCHAR(255),
      stored_path VARCHAR(500),
      file_size BIGINT,
      mime_type VARCHAR(100),
      FOREIGN KEY (task_id) REFERENCES tool_ocr_tasks(id) ON DELETE CASCADE
    );
    
  4. tool_ocr_sessions table (for token management):

    CREATE TABLE tool_ocr_sessions (
      id INT PRIMARY KEY AUTO_INCREMENT,
      user_id INT NOT NULL,
      access_token TEXT,
      id_token TEXT,
      expires_at TIMESTAMP,
      created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
      FOREIGN KEY (user_id) REFERENCES tool_ocr_users(id) ON DELETE CASCADE,
      INDEX idx_user (user_id),
      INDEX idx_expires (expires_at)
    );
    

Session Management

  • Store external API tokens in session/cache instead of local JWT
  • Implement token refresh mechanism based on expires_in field
  • Use expiresAt timestamp for token expiration validation

New Features: User Task Isolation and History

Task Isolation

  • Principle: Each user can only see and access their own tasks
  • Implementation: All task queries filtered by user_id at API level
  • Security: Enforce user context validation in all task-related endpoints

Task History Features

  1. Task Status Tracking:

    • View pending tasks (waiting to process)
    • View processing tasks (currently running)
    • View completed tasks (with results available)
    • View failed tasks (with error messages)
  2. Historical Query Capabilities:

    • Search tasks by filename
    • Filter by date range
    • Filter by status
    • Sort by creation/completion time
    • Pagination for large result sets
  3. Task Management:

    • Download original files (if not auto-deleted)
    • Download results (JSON, Markdown, PDF exports)
    • Re-process failed tasks
    • Delete old tasks manually

Frontend UI Changes

  1. New Components:

    • Task History page/tab
    • Task filters and search bar
    • Task status badges
    • Batch action controls
  2. Task List View:

    | Filename | Status | Created | Completed | Actions |
    |----------|--------|---------|-----------|---------|
    | doc1.pdf | ✅ Completed | 2025-11-14 10:00 | 2025-11-14 10:05 | [Download] [View] |
    | doc2.pdf | 🔄 Processing | 2025-11-14 10:10 | - | [Cancel] |
    | doc3.pdf | ❌ Failed | 2025-11-14 09:00 | - | [Retry] [View Error] |
    
  3. User Information Display:

    • Show user display name in header
    • Show last login time
    • Show task statistics (total, completed, failed)

Impact

Affected Capabilities

  • authentication: Complete replacement of authentication mechanism
  • user-management: Simplified to read-only user information from external API
  • session-management: Modified to handle external tokens
  • task-management: NEW - User-specific task isolation and history
  • file-access-control: NEW - User-based file access restrictions

Affected Code

  • Backend Authentication:

    • backend/app/api/v1/endpoints/auth.py: Replace login logic with external API call
    • backend/app/core/security.py: Modify token validation to use external tokens
    • backend/app/core/auth.py: Update authentication dependencies
    • backend/app/services/auth_service.py: New service for external API integration
  • Database Models:

    • backend/app/models/user.py: Complete redesign with new schema
    • backend/app/models/task.py: NEW - Task model with user association
    • backend/app/models/task_file.py: NEW - Task file model
    • backend/alembic/versions/: Complete database recreation
  • Task Management APIs (NEW):

    • backend/app/api/v1/endpoints/tasks.py: Task CRUD operations with user isolation
    • backend/app/api/v1/endpoints/task_history.py: Historical query endpoints
    • backend/app/services/task_service.py: Task business logic
    • backend/app/services/file_access_service.py: User-based file access control
  • Frontend:

    • frontend/src/services/authService.ts: Update to handle new token format
    • frontend/src/stores/authStore.ts: Modify to store/display user info from API
    • frontend/src/components/Header.tsx: Display name field and user menu
    • frontend/src/pages/TaskHistory.tsx: NEW - Task history page
    • frontend/src/components/TaskList.tsx: NEW - Task list component with filters
    • frontend/src/components/TaskFilters.tsx: NEW - Search and filter UI
    • frontend/src/stores/taskStore.ts: NEW - Task state management
    • frontend/src/services/taskService.ts: NEW - Task API client

Dependencies

  • Add httpx or aiohttp for async HTTP requests to external API (already present)
  • No new package dependencies required

Configuration

  • New environment variables:
    • EXTERNAL_AUTH_API_URL = "https://pj-auth-api.vercel.app"
    • EXTERNAL_AUTH_ENDPOINT = "/api/auth/login"
    • EXTERNAL_AUTH_TIMEOUT = 30 (seconds)
    • TOKEN_REFRESH_BUFFER = 300 (refresh tokens 5 minutes before expiry)
    • TASK_RETENTION_DAYS = 30 (auto-delete old tasks)
    • MAX_TASKS_PER_USER = 1000 (limit per user)
    • ENABLE_TASK_HISTORY = true (enable history feature)
    • DATABASE_TABLE_PREFIX = "tool_ocr_" (table naming prefix)

Security Considerations

  • HTTPS required for all authentication requests
  • Token storage must be secure (HTTPOnly cookies or secure session storage)
  • Implement rate limiting for authentication attempts
  • Log all authentication events for audit trail
  • Validate SSL certificates for external API calls
  • Handle network failures gracefully with appropriate error messages
  • User Isolation: Enforce user context in all database queries
  • File Access Control: Validate user ownership before file access
  • API Security: Add user_id validation in all task-related endpoints

Migration Plan (Simplified - No Rollback Needed)

  1. Phase 1: Backup existing database (for reference only)
  2. Phase 2: Drop old tables and create new schema
  3. Phase 3: Deploy new authentication and task management system
  4. Phase 4: Test with initial users
  5. Phase 5: Full deployment

Note: Since this is a test system with no production data to preserve, we can perform a clean migration without rollback concerns.

Risks and Mitigations

Risks

  1. External API Unavailability: Authentication service downtime blocks all logins

    • Mitigation: Implement fallback to local auth, cache tokens, implement retry logic
  2. Token Expiration Handling: Users may be logged out unexpectedly

    • Mitigation: Implement automatic token refresh before expiration
  3. Network Latency: Slower authentication due to external API calls

    • Mitigation: Implement proper timeout handling, async requests, response caching
  4. Data Consistency: User information mismatch between local DB and external system

    • Mitigation: Regular sync jobs, use external system as single source of truth
  5. Breaking Change: Existing sessions will be invalidated

    • Mitigation: Provide migration window, clear communication to users

Success Criteria

  • All users can authenticate via external API
  • Authentication response time < 2 seconds (95th percentile)
  • Zero data loss during migration
  • Automatic token refresh works without user intervention
  • Proper error messages for all failure scenarios
  • Audit logs capture all authentication events
  • Rollback procedure tested and documented