Files
OCR/openspec/changes/migrate-to-external-api-authentication/tasks.md
egg 88f9fef2d4 refactor: enhance auth migration proposal with user task isolation
Major updates based on feedback:
1. Remove Azure AD ID storage - use email as primary identifier
2. Complete database redesign - no backward compatibility needed
3. Add comprehensive user task isolation and history features

Database changes:
- Simplified users table (email-based)
- New ocr_tasks table with user association
- New task_files table for file tracking
- Proper indexes for performance

New features:
- User task isolation (A cannot see B's tasks)
- Task history with status tracking (pending/processing/completed/failed)
- Historical query capabilities with filters
- Download support for completed tasks
- Task management UI with search and filters

Security enhancements:
- User context validation in all endpoints
- File access control based on ownership
- Row-level security in database queries
- API-level authorization checks

Implementation approach:
- Clean migration without rollback concerns
- Drop old tables and start fresh
- Simplified deployment process
- Comprehensive task management system

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-14 15:33:18 +08:00

9.1 KiB

Implementation Tasks

1. Database Schema Redesign

  • 1.1 Backup existing database (for reference)
    • Export current schema and data
    • Document any important data to preserve
  • 1.2 Drop old tables
    • Remove existing users table
    • Remove any related tables
    • Clear database for fresh start
  • 1.3 Create new database schema
    • Create new users table (email as primary identifier)
    • Create ocr_tasks table with user association
    • Create task_files table for file tracking
    • Add proper indexes for performance
  • 1.4 Create SQLAlchemy models
    • User model (simplified)
    • Task model with user relationship
    • TaskFile model with cascade delete
  • 1.5 Generate Alembic migration
    • Create initial migration for new schema
    • Test migration script

2. Configuration Management

  • 2.1 Update environment configuration
    • Add EXTERNAL_AUTH_API_URL to .env.local
    • Add EXTERNAL_AUTH_ENDPOINT configuration
    • Add EXTERNAL_AUTH_TIMEOUT setting
    • Add TOKEN_REFRESH_BUFFER setting
    • Add TASK_RETENTION_DAYS for auto-cleanup
    • Add MAX_TASKS_PER_USER for limits
    • Add ENABLE_TASK_HISTORY feature flag
  • 2.2 Update Settings class
    • Add external auth settings to backend/app/core/config.py
    • Add task management settings
    • Add validation for new configuration values
    • Remove old authentication settings

3. External API Integration Service

  • 3.1 Create auth API client
    • Implement backend/app/services/external_auth_service.py
    • Create async HTTP client for API calls
    • Implement request/response models
    • Add proper error handling and logging
  • 3.2 Implement authentication methods
    • authenticate_user() - Call external API
    • validate_token() - Verify token validity
    • refresh_token() - Handle token refresh
    • get_user_info() - Fetch user details
  • 3.3 Add resilience patterns
    • Implement retry logic with exponential backoff
    • Add circuit breaker pattern
    • Implement timeout handling
    • Add fallback mechanisms

4. Backend Authentication Updates

  • 4.1 Modify login endpoint
    • Update backend/app/api/v1/endpoints/auth.py
    • Route to external API based on feature flag
    • Handle both authentication modes during transition
    • Return appropriate token format
  • 4.2 Update token validation
    • Modify backend/app/core/security.py
    • Support both local and external tokens
    • Implement token type detection
    • Update JWT validation logic
  • 4.3 Update authentication dependencies
    • Modify backend/app/core/auth.py
    • Update get_current_user() dependency
    • Handle external user information
    • Implement proper user context

5. Session and Token Management

  • 5.1 Implement token storage
    • Store external tokens securely
    • Implement token encryption at rest
    • Handle multiple token types (access, ID, refresh)
  • 5.2 Create token refresh mechanism
    • Background task for token refresh
    • Refresh tokens before expiration
    • Update stored tokens atomically
    • Handle refresh failures gracefully
  • 5.3 Session invalidation
    • Clear tokens on logout
    • Handle token revocation
    • Implement session timeout

6. Frontend Updates

  • 6.1 Update authentication service
    • Modify frontend/src/services/authService.ts
    • Handle new token format
    • Store user display information
    • Implement token refresh on client side
  • 6.2 Update auth store
    • Modify frontend/src/stores/authStore.ts
    • Store external user information
    • Update user display logic
    • Handle token expiration
  • 6.3 Update UI components
    • Modify frontend/src/components/Header.tsx
    • Display user name instead of username
    • Show additional user information
    • Update login form if needed
  • 6.4 Error handling
    • Handle external API errors
    • Display appropriate error messages
    • Implement retry UI for failures
    • Add loading states

7. Task Management System (NEW)

  • 7.1 Create task management backend
    • Implement backend/app/models/task.py
    • Implement backend/app/models/task_file.py
    • Create backend/app/services/task_service.py
    • Add task CRUD operations with user isolation
  • 7.2 Implement task APIs
    • Create backend/app/api/v1/endpoints/tasks.py
    • GET /tasks (list user's tasks with pagination)
    • GET /tasks/{id} (get specific task)
    • DELETE /tasks/{id} (delete task)
    • POST /tasks/{id}/retry (retry failed task)
  • 7.3 Create task history endpoints
    • Create backend/app/api/v1/endpoints/task_history.py
    • GET /history (query with filters)
    • GET /history/stats (user statistics)
    • POST /history/export (export history)
  • 7.4 Implement file access control
    • Create backend/app/services/file_access_service.py
    • Validate user ownership before file access
    • Restrict download to user's own files
    • Add audit logging for file access
  • 7.5 Update OCR service integration
    • Link OCR tasks to user accounts
    • Save task records in database
    • Update task status during processing
    • Store result file paths

8. Frontend Task Management UI (NEW)

  • 8.1 Create task history page
    • Implement frontend/src/pages/TaskHistory.tsx
    • Display task list with status indicators
    • Add pagination controls
    • Show task details modal
  • 8.2 Build task list component
    • Implement frontend/src/components/TaskList.tsx
    • Display task table with columns
    • Add sorting capabilities
    • Implement action buttons
  • 8.3 Create filter components
    • Implement frontend/src/components/TaskFilters.tsx
    • Date range picker
    • Status filter dropdown
    • Search by filename
    • Clear filters button
  • 8.4 Add task management store
    • Implement frontend/src/stores/taskStore.ts
    • Manage task list state
    • Handle filter state
    • Cache task data
  • 8.5 Create task service client
    • Implement frontend/src/services/taskService.ts
    • API methods for task operations
    • Handle pagination
    • Implement retry logic
  • 8.6 Update navigation
    • Add "Task History" menu item
    • Update router configuration
    • Add task count badge
    • Implement user menu with stats

9. User Isolation and Security

  • 9.1 Implement user context middleware
    • Create middleware to inject user context
    • Validate user in all requests
    • Add user_id to logging context
  • 9.2 Database query isolation
    • Add user_id filter to all task queries
    • Prevent cross-user data access
    • Implement row-level security
  • 9.3 File system isolation
    • Organize files by user directory
    • Validate file paths before access
    • Implement cleanup for deleted users
  • 9.4 API authorization
    • Add @require_user decorator
    • Validate ownership in endpoints
    • Return 403 for unauthorized access

10. Testing

  • 10.1 Unit tests
    • Test external auth service
    • Test token validation
    • Test task isolation logic
    • Test file access control
  • 10.2 Integration tests
    • Test full authentication flow
    • Test task management flow
    • Test user isolation between accounts
    • Test file download restrictions
  • 10.3 Load testing
    • Test external API response times
    • Test system with many concurrent users
    • Test large task history queries
    • Measure database query performance
  • 10.4 Security testing
    • Test token security
    • Verify user isolation
    • Test unauthorized access attempts
    • Validate SQL injection prevention

11. Migration Execution (Simplified)

  • 11.1 Pre-migration preparation
    • Backup existing database (reference only)
    • Prepare deployment package
    • Set up monitoring
  • 11.2 Execute migration
    • Drop old database tables
    • Create new schema
    • Deploy new code
    • Verify system startup
  • 11.3 Post-migration validation
    • Test authentication with real users
    • Verify task isolation works
    • Check task history functionality
    • Validate file access controls

12. Documentation

  • 12.1 Technical documentation
    • Update API documentation with new endpoints
    • Document authentication flow
    • Document task management APIs
    • Create troubleshooting guide
  • 12.2 User documentation
    • Update login instructions
    • Document task history features
    • Explain user isolation
    • Create user guide for new UI
  • 12.3 Developer documentation
    • Document database schema
    • Explain security model
    • Provide integration examples

13. Monitoring and Observability

  • 13.1 Add monitoring metrics
    • Authentication success/failure rates
    • Task creation/completion rates
    • User activity metrics
    • File storage usage
  • 13.2 Implement logging
    • Log all authentication attempts
    • Log task operations
    • Log file access attempts
    • Structured logging for analysis
  • 13.3 Create alerts
    • Alert on authentication failures
    • Alert on high error rates
    • Alert on storage issues
    • Alert on performance degradation

14. Performance Optimization (Post-Launch)

  • 14.1 Database optimization
    • Analyze query patterns
    • Add missing indexes
    • Optimize slow queries
  • 14.2 Caching implementation
    • Cache user information
    • Cache task lists
    • Implement Redis if needed
  • 14.3 File management
    • Implement automatic cleanup
    • Optimize storage structure
    • Add compression if needed