diff --git a/openspec/changes/migrate-to-external-api-authentication/proposal.md b/openspec/changes/migrate-to-external-api-authentication/proposal.md index 98939b0..0737d79 100644 --- a/openspec/changes/migrate-to-external-api-authentication/proposal.md +++ b/openspec/changes/migrate-to-external-api-authentication/proposal.md @@ -72,25 +72,118 @@ By migrating to the external API authentication service at https://pj-auth-api.v ``` ### Database Schema Changes -- **users table modifications**: - - Remove/deprecate `hashed_password` column (keep for rollback) - - Add `external_user_id` (VARCHAR 255) - Store Azure AD user ID - - Add `display_name` (VARCHAR 255) - Store user display name from API - - Add `azure_email` (VARCHAR 255) - Store Azure AD email - - Add `last_token_refresh` (DATETIME) - Track token refresh timing - - Keep `username` for backward compatibility (can be email) + +**Complete Redesign (No backward compatibility needed)**: + +1. **users table (redesigned)**: + ```sql + CREATE TABLE users ( + id INT PRIMARY KEY AUTO_INCREMENT, + email VARCHAR(255) UNIQUE NOT NULL, -- Primary identifier from Azure AD + display_name VARCHAR(255), -- Display name from API response + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + last_login TIMESTAMP, + is_active BOOLEAN DEFAULT TRUE + ); + ``` + Note: No Azure AD ID storage needed - email is sufficient as unique identifier + +2. **ocr_tasks table (new - for task history)**: + ```sql + CREATE TABLE ocr_tasks ( + id INT PRIMARY KEY AUTO_INCREMENT, + user_id INT NOT NULL, -- Foreign key to users table + task_id VARCHAR(255) UNIQUE, -- Unique task identifier + filename VARCHAR(255), + file_type VARCHAR(50), + status ENUM('pending', 'processing', 'completed', 'failed'), + result_json_path VARCHAR(500), + result_markdown_path VARCHAR(500), + error_message TEXT, + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, + completed_at TIMESTAMP NULL, + file_deleted BOOLEAN DEFAULT FALSE, -- Track if files were auto-deleted + FOREIGN KEY (user_id) REFERENCES users(id), + INDEX idx_user_status (user_id, status), + INDEX idx_created (created_at) + ); + ``` + +3. **task_files table (for multiple files per task)**: + ```sql + CREATE TABLE task_files ( + id INT PRIMARY KEY AUTO_INCREMENT, + task_id INT NOT NULL, + original_name VARCHAR(255), + stored_path VARCHAR(500), + file_size BIGINT, + mime_type VARCHAR(100), + FOREIGN KEY (task_id) REFERENCES ocr_tasks(id) ON DELETE CASCADE + ); + ``` ### Session Management - Store external API tokens in session/cache instead of local JWT - Implement token refresh mechanism based on `expires_in` field - Use `expiresAt` timestamp for token expiration validation +## New Features: User Task Isolation and History + +### Task Isolation +- **Principle**: Each user can only see and access their own tasks +- **Implementation**: All task queries filtered by `user_id` at API level +- **Security**: Enforce user context validation in all task-related endpoints + +### Task History Features +1. **Task Status Tracking**: + - View pending tasks (waiting to process) + - View processing tasks (currently running) + - View completed tasks (with results available) + - View failed tasks (with error messages) + +2. **Historical Query Capabilities**: + - Search tasks by filename + - Filter by date range + - Filter by status + - Sort by creation/completion time + - Pagination for large result sets + +3. **Task Management**: + - Download original files (if not auto-deleted) + - Download results (JSON, Markdown, PDF exports) + - Re-process failed tasks + - Delete old tasks manually + +### Frontend UI Changes +1. **New Components**: + - Task History page/tab + - Task filters and search bar + - Task status badges + - Batch action controls + +2. **Task List View**: + ``` + | Filename | Status | Created | Completed | Actions | + |----------|--------|---------|-----------|---------| + | doc1.pdf | ✅ Completed | 2025-11-14 10:00 | 2025-11-14 10:05 | [Download] [View] | + | doc2.pdf | 🔄 Processing | 2025-11-14 10:10 | - | [Cancel] | + | doc3.pdf | ❌ Failed | 2025-11-14 09:00 | - | [Retry] [View Error] | + ``` + +3. **User Information Display**: + - Show user display name in header + - Show last login time + - Show task statistics (total, completed, failed) + ## Impact ### Affected Capabilities - `authentication`: Complete replacement of authentication mechanism - `user-management`: Simplified to read-only user information from external API - `session-management`: Modified to handle external tokens +- `task-management`: NEW - User-specific task isolation and history +- `file-access-control`: NEW - User-based file access restrictions ### Affected Code - **Backend Authentication**: @@ -100,13 +193,26 @@ By migrating to the external API authentication service at https://pj-auth-api.v - `backend/app/services/auth_service.py`: New service for external API integration - **Database Models**: - - `backend/app/models/user.py`: Update User model with new fields - - `backend/alembic/versions/`: New migration for schema changes + - `backend/app/models/user.py`: Complete redesign with new schema + - `backend/app/models/task.py`: NEW - Task model with user association + - `backend/app/models/task_file.py`: NEW - Task file model + - `backend/alembic/versions/`: Complete database recreation + +- **Task Management APIs** (NEW): + - `backend/app/api/v1/endpoints/tasks.py`: Task CRUD operations with user isolation + - `backend/app/api/v1/endpoints/task_history.py`: Historical query endpoints + - `backend/app/services/task_service.py`: Task business logic + - `backend/app/services/file_access_service.py`: User-based file access control - **Frontend**: - `frontend/src/services/authService.ts`: Update to handle new token format - `frontend/src/stores/authStore.ts`: Modify to store/display user info from API - - `frontend/src/components/Header.tsx`: Display `name` field instead of username + - `frontend/src/components/Header.tsx`: Display `name` field and user menu + - `frontend/src/pages/TaskHistory.tsx`: NEW - Task history page + - `frontend/src/components/TaskList.tsx`: NEW - Task list component with filters + - `frontend/src/components/TaskFilters.tsx`: NEW - Search and filter UI + - `frontend/src/stores/taskStore.ts`: NEW - Task state management + - `frontend/src/services/taskService.ts`: NEW - Task API client ### Dependencies - Add `httpx` or `aiohttp` for async HTTP requests to external API (already present) @@ -117,8 +223,10 @@ By migrating to the external API authentication service at https://pj-auth-api.v - `EXTERNAL_AUTH_API_URL` = "https://pj-auth-api.vercel.app" - `EXTERNAL_AUTH_ENDPOINT` = "/api/auth/login" - `EXTERNAL_AUTH_TIMEOUT` = 30 (seconds) - - `USE_EXTERNAL_AUTH` = true (feature flag for gradual rollout) - `TOKEN_REFRESH_BUFFER` = 300 (refresh tokens 5 minutes before expiry) + - `TASK_RETENTION_DAYS` = 30 (auto-delete old tasks) + - `MAX_TASKS_PER_USER` = 1000 (limit per user) + - `ENABLE_TASK_HISTORY` = true (enable history feature) ### Security Considerations - HTTPS required for all authentication requests @@ -127,21 +235,18 @@ By migrating to the external API authentication service at https://pj-auth-api.v - Log all authentication events for audit trail - Validate SSL certificates for external API calls - Handle network failures gracefully with appropriate error messages +- **User Isolation**: Enforce user context in all database queries +- **File Access Control**: Validate user ownership before file access +- **API Security**: Add user_id validation in all task-related endpoints -### Rollback Strategy -- Keep existing authentication code with feature flag -- Maintain password column in database (don't drop immediately) -- Implement dual authentication mode during transition: - - If `USE_EXTERNAL_AUTH=true`: Use external API - - If `USE_EXTERNAL_AUTH=false`: Use local authentication -- Provide migration script to sync existing users with external system +### Migration Plan (Simplified - No Rollback Needed) +1. **Phase 1**: Backup existing database (for reference only) +2. **Phase 2**: Drop old tables and create new schema +3. **Phase 3**: Deploy new authentication and task management system +4. **Phase 4**: Test with initial users +5. **Phase 5**: Full deployment -### Migration Plan -1. **Phase 1**: Implement external API authentication alongside existing system -2. **Phase 2**: Test with subset of users (based on domain or user flag) -3. **Phase 3**: Gradual rollout to all users -4. **Phase 4**: Deprecate local authentication (keep code for emergency) -5. **Phase 5**: Remove local authentication code (after stable period) +Note: Since this is a test system with no production data to preserve, we can perform a clean migration without rollback concerns. ## Risks and Mitigations diff --git a/openspec/changes/migrate-to-external-api-authentication/tasks.md b/openspec/changes/migrate-to-external-api-authentication/tasks.md index 50fb123..c2d18de 100644 --- a/openspec/changes/migrate-to-external-api-authentication/tasks.md +++ b/openspec/changes/migrate-to-external-api-authentication/tasks.md @@ -1,32 +1,40 @@ # Implementation Tasks -## 1. Database Schema Updates -- [ ] 1.1 Create database migration script - - Add `external_user_id` column (VARCHAR 255) - - Add `display_name` column (VARCHAR 255) - - Add `azure_email` column (VARCHAR 255) - - Add `last_token_refresh` column (DATETIME) - - Mark `hashed_password` as nullable (for gradual migration) -- [ ] 1.2 Update User model - - Add new fields to SQLAlchemy model - - Update model relationships if needed - - Add migration version with Alembic -- [ ] 1.3 Create user sync mechanism - - Script to map existing users to external IDs - - Handle users without external accounts - - Backup existing user data +## 1. Database Schema Redesign +- [ ] 1.1 Backup existing database (for reference) + - Export current schema and data + - Document any important data to preserve +- [ ] 1.2 Drop old tables + - Remove existing users table + - Remove any related tables + - Clear database for fresh start +- [ ] 1.3 Create new database schema + - Create new `users` table (email as primary identifier) + - Create `ocr_tasks` table with user association + - Create `task_files` table for file tracking + - Add proper indexes for performance +- [ ] 1.4 Create SQLAlchemy models + - User model (simplified) + - Task model with user relationship + - TaskFile model with cascade delete +- [ ] 1.5 Generate Alembic migration + - Create initial migration for new schema + - Test migration script ## 2. Configuration Management - [ ] 2.1 Update environment configuration - Add `EXTERNAL_AUTH_API_URL` to `.env.local` - Add `EXTERNAL_AUTH_ENDPOINT` configuration - Add `EXTERNAL_AUTH_TIMEOUT` setting - - Add `USE_EXTERNAL_AUTH` feature flag - Add `TOKEN_REFRESH_BUFFER` setting + - Add `TASK_RETENTION_DAYS` for auto-cleanup + - Add `MAX_TASKS_PER_USER` for limits + - Add `ENABLE_TASK_HISTORY` feature flag - [ ] 2.2 Update Settings class - Add external auth settings to `backend/app/core/config.py` + - Add task management settings - Add validation for new configuration values - - Implement feature flag logic + - Remove old authentication settings ## 3. External API Integration Service - [ ] 3.1 Create auth API client @@ -99,82 +107,166 @@ - Implement retry UI for failures - Add loading states -## 7. Testing -- [ ] 7.1 Unit tests +## 7. Task Management System (NEW) +- [ ] 7.1 Create task management backend + - Implement `backend/app/models/task.py` + - Implement `backend/app/models/task_file.py` + - Create `backend/app/services/task_service.py` + - Add task CRUD operations with user isolation +- [ ] 7.2 Implement task APIs + - Create `backend/app/api/v1/endpoints/tasks.py` + - GET /tasks (list user's tasks with pagination) + - GET /tasks/{id} (get specific task) + - DELETE /tasks/{id} (delete task) + - POST /tasks/{id}/retry (retry failed task) +- [ ] 7.3 Create task history endpoints + - Create `backend/app/api/v1/endpoints/task_history.py` + - GET /history (query with filters) + - GET /history/stats (user statistics) + - POST /history/export (export history) +- [ ] 7.4 Implement file access control + - Create `backend/app/services/file_access_service.py` + - Validate user ownership before file access + - Restrict download to user's own files + - Add audit logging for file access +- [ ] 7.5 Update OCR service integration + - Link OCR tasks to user accounts + - Save task records in database + - Update task status during processing + - Store result file paths + +## 8. Frontend Task Management UI (NEW) +- [ ] 8.1 Create task history page + - Implement `frontend/src/pages/TaskHistory.tsx` + - Display task list with status indicators + - Add pagination controls + - Show task details modal +- [ ] 8.2 Build task list component + - Implement `frontend/src/components/TaskList.tsx` + - Display task table with columns + - Add sorting capabilities + - Implement action buttons +- [ ] 8.3 Create filter components + - Implement `frontend/src/components/TaskFilters.tsx` + - Date range picker + - Status filter dropdown + - Search by filename + - Clear filters button +- [ ] 8.4 Add task management store + - Implement `frontend/src/stores/taskStore.ts` + - Manage task list state + - Handle filter state + - Cache task data +- [ ] 8.5 Create task service client + - Implement `frontend/src/services/taskService.ts` + - API methods for task operations + - Handle pagination + - Implement retry logic +- [ ] 8.6 Update navigation + - Add "Task History" menu item + - Update router configuration + - Add task count badge + - Implement user menu with stats + +## 9. User Isolation and Security +- [ ] 9.1 Implement user context middleware + - Create middleware to inject user context + - Validate user in all requests + - Add user_id to logging context +- [ ] 9.2 Database query isolation + - Add user_id filter to all task queries + - Prevent cross-user data access + - Implement row-level security +- [ ] 9.3 File system isolation + - Organize files by user directory + - Validate file paths before access + - Implement cleanup for deleted users +- [ ] 9.4 API authorization + - Add @require_user decorator + - Validate ownership in endpoints + - Return 403 for unauthorized access + +## 10. Testing +- [ ] 10.1 Unit tests - Test external auth service - Test token validation - - Test user information mapping - - Test error scenarios -- [ ] 7.2 Integration tests + - Test task isolation logic + - Test file access control +- [ ] 10.2 Integration tests - Test full authentication flow - - Test token refresh mechanism - - Test fallback scenarios - - Test feature flag switching -- [ ] 7.3 Load testing + - Test task management flow + - Test user isolation between accounts + - Test file download restrictions +- [ ] 10.3 Load testing - Test external API response times - - Test system under high authentication load - - Measure impact on performance -- [ ] 7.4 Security testing + - Test system with many concurrent users + - Test large task history queries + - Measure database query performance +- [ ] 10.4 Security testing - Test token security - - Verify HTTPS enforcement - - Test rate limiting - - Validate error message security + - Verify user isolation + - Test unauthorized access attempts + - Validate SQL injection prevention -## 8. Migration Execution -- [ ] 8.1 Pre-migration preparation - - Backup database - - Document rollback procedure - - Prepare user communication +## 11. Migration Execution (Simplified) +- [ ] 11.1 Pre-migration preparation + - Backup existing database (reference only) + - Prepare deployment package - Set up monitoring -- [ ] 8.2 Staged rollout - - Enable for test users first - - Monitor for issues - - Gradually increase user percentage - - Collect feedback -- [ ] 8.3 Post-migration validation - - Verify all users can login - - Check audit logs - - Monitor error rates - - Validate performance metrics +- [ ] 11.2 Execute migration + - Drop old database tables + - Create new schema + - Deploy new code + - Verify system startup +- [ ] 11.3 Post-migration validation + - Test authentication with real users + - Verify task isolation works + - Check task history functionality + - Validate file access controls -## 9. Documentation -- [ ] 9.1 Technical documentation - - Update API documentation +## 12. Documentation +- [ ] 12.1 Technical documentation + - Update API documentation with new endpoints - Document authentication flow - - Update deployment guide + - Document task management APIs - Create troubleshooting guide -- [ ] 9.2 User documentation +- [ ] 12.2 User documentation - Update login instructions - - Document new features - - Create FAQ for common issues -- [ ] 9.3 Operations documentation - - Document monitoring points - - Create runbook for issues - - Document rollback procedure + - Document task history features + - Explain user isolation + - Create user guide for new UI +- [ ] 12.3 Developer documentation + - Document database schema + - Explain security model + - Provide integration examples -## 10. Monitoring and Observability -- [ ] 10.1 Add monitoring metrics +## 13. Monitoring and Observability +- [ ] 13.1 Add monitoring metrics - Authentication success/failure rates - - External API response times - - Token refresh success rate - - Error rate monitoring -- [ ] 10.2 Implement logging + - Task creation/completion rates + - User activity metrics + - File storage usage +- [ ] 13.2 Implement logging - Log all authentication attempts - - Log external API calls - - Log token operations + - Log task operations + - Log file access attempts - Structured logging for analysis -- [ ] 10.3 Create alerts - - Alert on high failure rates - - Alert on external API unavailability - - Alert on token refresh failures - - Alert on unusual patterns +- [ ] 13.3 Create alerts + - Alert on authentication failures + - Alert on high error rates + - Alert on storage issues + - Alert on performance degradation -## 11. Cleanup (Post-Stabilization) -- [ ] 11.1 Remove legacy code - - Remove local authentication code (after stable period) - - Remove unused database columns - - Clean up configuration -- [ ] 11.2 Optimize performance - - Implement caching where appropriate - - Optimize database queries - - Review and optimize API calls \ No newline at end of file +## 14. Performance Optimization (Post-Launch) +- [ ] 14.1 Database optimization + - Analyze query patterns + - Add missing indexes + - Optimize slow queries +- [ ] 14.2 Caching implementation + - Cache user information + - Cache task lists + - Implement Redis if needed +- [ ] 14.3 File management + - Implement automatic cleanup + - Optimize storage structure + - Add compression if needed \ No newline at end of file