Added `tool_ocr_` prefix to all database tables for clear separation from other systems in the same database. Changes: - All tables now use `tool_ocr_` prefix - Added tool_ocr_sessions table for token management - Created complete SQL schema file with: - Full table definitions with comments - Indexes for performance - Views for common queries - Stored procedures for maintenance - Audit log table (optional) New files: - database_schema.sql: Ready-to-use SQL script for deployment Configuration: - Added DATABASE_TABLE_PREFIX environment variable - Updated all references to use prefixed table names Benefits: - Clear namespace separation in shared databases - Easier identification of Tool_OCR tables - Prevent conflicts with other applications 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
11 KiB
Change: Migrate to External API Authentication
Why
The current local database authentication system has several limitations:
- User credentials are managed locally, requiring manual user creation and password management
- No centralized authentication with enterprise identity systems
- Cannot leverage existing enterprise authentication infrastructure (e.g., Microsoft Azure AD)
- No single sign-on (SSO) capability
- Increased maintenance overhead for user management
By migrating to the external API authentication service at https://pj-auth-api.vercel.app, the system will:
- Integrate with enterprise Microsoft Azure AD authentication
- Enable single sign-on (SSO) for users
- Eliminate local password management
- Leverage existing enterprise user management and security policies
- Reduce maintenance overhead
- Provide consistent authentication across multiple applications
What Changes
Authentication Flow
- Current: Local database authentication using username/password stored in MySQL
- New: External API authentication via POST to
https://pj-auth-api.vercel.app/api/auth/login - Token Management: Use JWT tokens from external API instead of locally generated tokens
- User Display: Use
namefield from API response for user display instead of local username
API Integration
Endpoint: POST https://pj-auth-api.vercel.app/api/auth/login
Request Format:
{
"username": "user@domain.com",
"password": "user_password"
}
Success Response (200):
{
"success": true,
"message": "認證成功",
"data": {
"access_token": "eyJ0eXAiOiJKV1Q...",
"id_token": "eyJ0eXAiOiJKV1Q...",
"expires_in": 4999,
"token_type": "Bearer",
"userInfo": {
"id": "42cf0b98-f598-47dd-ae2a-f33803f87d41",
"name": "ymirliu 劉念萱",
"email": "ymirliu@panjit.com.tw",
"jobTitle": null,
"officeLocation": "高雄",
"businessPhones": ["1580"]
},
"issuedAt": "2025-11-14T07:09:15.203Z",
"expiresAt": "2025-11-14T08:32:34.203Z"
},
"timestamp": "2025-11-14T07:09:15.203Z"
}
Failure Response (401):
{
"success": false,
"error": "用戶名或密碼錯誤",
"code": "INVALID_CREDENTIALS",
"timestamp": "2025-11-14T07:10:02.585Z"
}
Database Schema Changes
Complete Redesign (No backward compatibility needed):
Table Prefix: tool_ocr_ (for clear separation from other systems in the same database)
-
tool_ocr_users table (redesigned):
CREATE TABLE tool_ocr_users ( id INT PRIMARY KEY AUTO_INCREMENT, email VARCHAR(255) UNIQUE NOT NULL, -- Primary identifier from Azure AD display_name VARCHAR(255), -- Display name from API response created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, last_login TIMESTAMP, is_active BOOLEAN DEFAULT TRUE );Note: No Azure AD ID storage needed - email is sufficient as unique identifier
-
tool_ocr_tasks table (new - for task history):
CREATE TABLE tool_ocr_tasks ( id INT PRIMARY KEY AUTO_INCREMENT, user_id INT NOT NULL, -- Foreign key to users table task_id VARCHAR(255) UNIQUE, -- Unique task identifier filename VARCHAR(255), file_type VARCHAR(50), status ENUM('pending', 'processing', 'completed', 'failed'), result_json_path VARCHAR(500), result_markdown_path VARCHAR(500), error_message TEXT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, completed_at TIMESTAMP NULL, file_deleted BOOLEAN DEFAULT FALSE, -- Track if files were auto-deleted FOREIGN KEY (user_id) REFERENCES tool_ocr_users(id), INDEX idx_user_status (user_id, status), INDEX idx_created (created_at) ); -
tool_ocr_task_files table (for multiple files per task):
CREATE TABLE tool_ocr_task_files ( id INT PRIMARY KEY AUTO_INCREMENT, task_id INT NOT NULL, original_name VARCHAR(255), stored_path VARCHAR(500), file_size BIGINT, mime_type VARCHAR(100), FOREIGN KEY (task_id) REFERENCES tool_ocr_tasks(id) ON DELETE CASCADE ); -
tool_ocr_sessions table (for token management):
CREATE TABLE tool_ocr_sessions ( id INT PRIMARY KEY AUTO_INCREMENT, user_id INT NOT NULL, access_token TEXT, id_token TEXT, expires_at TIMESTAMP, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, FOREIGN KEY (user_id) REFERENCES tool_ocr_users(id) ON DELETE CASCADE, INDEX idx_user (user_id), INDEX idx_expires (expires_at) );
Session Management
- Store external API tokens in session/cache instead of local JWT
- Implement token refresh mechanism based on
expires_infield - Use
expiresAttimestamp for token expiration validation
New Features: User Task Isolation and History
Task Isolation
- Principle: Each user can only see and access their own tasks
- Implementation: All task queries filtered by
user_idat API level - Security: Enforce user context validation in all task-related endpoints
Task History Features
-
Task Status Tracking:
- View pending tasks (waiting to process)
- View processing tasks (currently running)
- View completed tasks (with results available)
- View failed tasks (with error messages)
-
Historical Query Capabilities:
- Search tasks by filename
- Filter by date range
- Filter by status
- Sort by creation/completion time
- Pagination for large result sets
-
Task Management:
- Download original files (if not auto-deleted)
- Download results (JSON, Markdown, PDF exports)
- Re-process failed tasks
- Delete old tasks manually
Frontend UI Changes
-
New Components:
- Task History page/tab
- Task filters and search bar
- Task status badges
- Batch action controls
-
Task List View:
| Filename | Status | Created | Completed | Actions | |----------|--------|---------|-----------|---------| | doc1.pdf | ✅ Completed | 2025-11-14 10:00 | 2025-11-14 10:05 | [Download] [View] | | doc2.pdf | 🔄 Processing | 2025-11-14 10:10 | - | [Cancel] | | doc3.pdf | ❌ Failed | 2025-11-14 09:00 | - | [Retry] [View Error] | -
User Information Display:
- Show user display name in header
- Show last login time
- Show task statistics (total, completed, failed)
Impact
Affected Capabilities
authentication: Complete replacement of authentication mechanismuser-management: Simplified to read-only user information from external APIsession-management: Modified to handle external tokenstask-management: NEW - User-specific task isolation and historyfile-access-control: NEW - User-based file access restrictions
Affected Code
-
Backend Authentication:
backend/app/api/v1/endpoints/auth.py: Replace login logic with external API callbackend/app/core/security.py: Modify token validation to use external tokensbackend/app/core/auth.py: Update authentication dependenciesbackend/app/services/auth_service.py: New service for external API integration
-
Database Models:
backend/app/models/user.py: Complete redesign with new schemabackend/app/models/task.py: NEW - Task model with user associationbackend/app/models/task_file.py: NEW - Task file modelbackend/alembic/versions/: Complete database recreation
-
Task Management APIs (NEW):
backend/app/api/v1/endpoints/tasks.py: Task CRUD operations with user isolationbackend/app/api/v1/endpoints/task_history.py: Historical query endpointsbackend/app/services/task_service.py: Task business logicbackend/app/services/file_access_service.py: User-based file access control
-
Frontend:
frontend/src/services/authService.ts: Update to handle new token formatfrontend/src/stores/authStore.ts: Modify to store/display user info from APIfrontend/src/components/Header.tsx: Displaynamefield and user menufrontend/src/pages/TaskHistory.tsx: NEW - Task history pagefrontend/src/components/TaskList.tsx: NEW - Task list component with filtersfrontend/src/components/TaskFilters.tsx: NEW - Search and filter UIfrontend/src/stores/taskStore.ts: NEW - Task state managementfrontend/src/services/taskService.ts: NEW - Task API client
Dependencies
- Add
httpxoraiohttpfor async HTTP requests to external API (already present) - No new package dependencies required
Configuration
- New environment variables:
EXTERNAL_AUTH_API_URL= "https://pj-auth-api.vercel.app"EXTERNAL_AUTH_ENDPOINT= "/api/auth/login"EXTERNAL_AUTH_TIMEOUT= 30 (seconds)TOKEN_REFRESH_BUFFER= 300 (refresh tokens 5 minutes before expiry)TASK_RETENTION_DAYS= 30 (auto-delete old tasks)MAX_TASKS_PER_USER= 1000 (limit per user)ENABLE_TASK_HISTORY= true (enable history feature)DATABASE_TABLE_PREFIX= "tool_ocr_" (table naming prefix)
Security Considerations
- HTTPS required for all authentication requests
- Token storage must be secure (HTTPOnly cookies or secure session storage)
- Implement rate limiting for authentication attempts
- Log all authentication events for audit trail
- Validate SSL certificates for external API calls
- Handle network failures gracefully with appropriate error messages
- User Isolation: Enforce user context in all database queries
- File Access Control: Validate user ownership before file access
- API Security: Add user_id validation in all task-related endpoints
Migration Plan (Simplified - No Rollback Needed)
- Phase 1: Backup existing database (for reference only)
- Phase 2: Drop old tables and create new schema
- Phase 3: Deploy new authentication and task management system
- Phase 4: Test with initial users
- Phase 5: Full deployment
Note: Since this is a test system with no production data to preserve, we can perform a clean migration without rollback concerns.
Risks and Mitigations
Risks
-
External API Unavailability: Authentication service downtime blocks all logins
- Mitigation: Implement fallback to local auth, cache tokens, implement retry logic
-
Token Expiration Handling: Users may be logged out unexpectedly
- Mitigation: Implement automatic token refresh before expiration
-
Network Latency: Slower authentication due to external API calls
- Mitigation: Implement proper timeout handling, async requests, response caching
-
Data Consistency: User information mismatch between local DB and external system
- Mitigation: Regular sync jobs, use external system as single source of truth
-
Breaking Change: Existing sessions will be invalidated
- Mitigation: Provide migration window, clear communication to users
Success Criteria
- All users can authenticate via external API
- Authentication response time < 2 seconds (95th percentile)
- Zero data loss during migration
- Automatic token refresh works without user intervention
- Proper error messages for all failure scenarios
- Audit logs capture all authentication events
- Rollback procedure tested and documented