refactor: enhance auth migration proposal with user task isolation

Major updates based on feedback:
1. Remove Azure AD ID storage - use email as primary identifier
2. Complete database redesign - no backward compatibility needed
3. Add comprehensive user task isolation and history features

Database changes:
- Simplified users table (email-based)
- New ocr_tasks table with user association
- New task_files table for file tracking
- Proper indexes for performance

New features:
- User task isolation (A cannot see B's tasks)
- Task history with status tracking (pending/processing/completed/failed)
- Historical query capabilities with filters
- Download support for completed tasks
- Task management UI with search and filters

Security enhancements:
- User context validation in all endpoints
- File access control based on ownership
- Row-level security in database queries
- API-level authorization checks

Implementation approach:
- Clean migration without rollback concerns
- Drop old tables and start fresh
- Simplified deployment process
- Comprehensive task management system

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
egg
2025-11-14 15:33:18 +08:00
parent 28e419f5fa
commit 88f9fef2d4
2 changed files with 301 additions and 104 deletions

View File

@@ -1,32 +1,40 @@
# Implementation Tasks
## 1. Database Schema Updates
- [ ] 1.1 Create database migration script
- Add `external_user_id` column (VARCHAR 255)
- Add `display_name` column (VARCHAR 255)
- Add `azure_email` column (VARCHAR 255)
- Add `last_token_refresh` column (DATETIME)
- Mark `hashed_password` as nullable (for gradual migration)
- [ ] 1.2 Update User model
- Add new fields to SQLAlchemy model
- Update model relationships if needed
- Add migration version with Alembic
- [ ] 1.3 Create user sync mechanism
- Script to map existing users to external IDs
- Handle users without external accounts
- Backup existing user data
## 1. Database Schema Redesign
- [ ] 1.1 Backup existing database (for reference)
- Export current schema and data
- Document any important data to preserve
- [ ] 1.2 Drop old tables
- Remove existing users table
- Remove any related tables
- Clear database for fresh start
- [ ] 1.3 Create new database schema
- Create new `users` table (email as primary identifier)
- Create `ocr_tasks` table with user association
- Create `task_files` table for file tracking
- Add proper indexes for performance
- [ ] 1.4 Create SQLAlchemy models
- User model (simplified)
- Task model with user relationship
- TaskFile model with cascade delete
- [ ] 1.5 Generate Alembic migration
- Create initial migration for new schema
- Test migration script
## 2. Configuration Management
- [ ] 2.1 Update environment configuration
- Add `EXTERNAL_AUTH_API_URL` to `.env.local`
- Add `EXTERNAL_AUTH_ENDPOINT` configuration
- Add `EXTERNAL_AUTH_TIMEOUT` setting
- Add `USE_EXTERNAL_AUTH` feature flag
- Add `TOKEN_REFRESH_BUFFER` setting
- Add `TASK_RETENTION_DAYS` for auto-cleanup
- Add `MAX_TASKS_PER_USER` for limits
- Add `ENABLE_TASK_HISTORY` feature flag
- [ ] 2.2 Update Settings class
- Add external auth settings to `backend/app/core/config.py`
- Add task management settings
- Add validation for new configuration values
- Implement feature flag logic
- Remove old authentication settings
## 3. External API Integration Service
- [ ] 3.1 Create auth API client
@@ -99,82 +107,166 @@
- Implement retry UI for failures
- Add loading states
## 7. Testing
- [ ] 7.1 Unit tests
## 7. Task Management System (NEW)
- [ ] 7.1 Create task management backend
- Implement `backend/app/models/task.py`
- Implement `backend/app/models/task_file.py`
- Create `backend/app/services/task_service.py`
- Add task CRUD operations with user isolation
- [ ] 7.2 Implement task APIs
- Create `backend/app/api/v1/endpoints/tasks.py`
- GET /tasks (list user's tasks with pagination)
- GET /tasks/{id} (get specific task)
- DELETE /tasks/{id} (delete task)
- POST /tasks/{id}/retry (retry failed task)
- [ ] 7.3 Create task history endpoints
- Create `backend/app/api/v1/endpoints/task_history.py`
- GET /history (query with filters)
- GET /history/stats (user statistics)
- POST /history/export (export history)
- [ ] 7.4 Implement file access control
- Create `backend/app/services/file_access_service.py`
- Validate user ownership before file access
- Restrict download to user's own files
- Add audit logging for file access
- [ ] 7.5 Update OCR service integration
- Link OCR tasks to user accounts
- Save task records in database
- Update task status during processing
- Store result file paths
## 8. Frontend Task Management UI (NEW)
- [ ] 8.1 Create task history page
- Implement `frontend/src/pages/TaskHistory.tsx`
- Display task list with status indicators
- Add pagination controls
- Show task details modal
- [ ] 8.2 Build task list component
- Implement `frontend/src/components/TaskList.tsx`
- Display task table with columns
- Add sorting capabilities
- Implement action buttons
- [ ] 8.3 Create filter components
- Implement `frontend/src/components/TaskFilters.tsx`
- Date range picker
- Status filter dropdown
- Search by filename
- Clear filters button
- [ ] 8.4 Add task management store
- Implement `frontend/src/stores/taskStore.ts`
- Manage task list state
- Handle filter state
- Cache task data
- [ ] 8.5 Create task service client
- Implement `frontend/src/services/taskService.ts`
- API methods for task operations
- Handle pagination
- Implement retry logic
- [ ] 8.6 Update navigation
- Add "Task History" menu item
- Update router configuration
- Add task count badge
- Implement user menu with stats
## 9. User Isolation and Security
- [ ] 9.1 Implement user context middleware
- Create middleware to inject user context
- Validate user in all requests
- Add user_id to logging context
- [ ] 9.2 Database query isolation
- Add user_id filter to all task queries
- Prevent cross-user data access
- Implement row-level security
- [ ] 9.3 File system isolation
- Organize files by user directory
- Validate file paths before access
- Implement cleanup for deleted users
- [ ] 9.4 API authorization
- Add @require_user decorator
- Validate ownership in endpoints
- Return 403 for unauthorized access
## 10. Testing
- [ ] 10.1 Unit tests
- Test external auth service
- Test token validation
- Test user information mapping
- Test error scenarios
- [ ] 7.2 Integration tests
- Test task isolation logic
- Test file access control
- [ ] 10.2 Integration tests
- Test full authentication flow
- Test token refresh mechanism
- Test fallback scenarios
- Test feature flag switching
- [ ] 7.3 Load testing
- Test task management flow
- Test user isolation between accounts
- Test file download restrictions
- [ ] 10.3 Load testing
- Test external API response times
- Test system under high authentication load
- Measure impact on performance
- [ ] 7.4 Security testing
- Test system with many concurrent users
- Test large task history queries
- Measure database query performance
- [ ] 10.4 Security testing
- Test token security
- Verify HTTPS enforcement
- Test rate limiting
- Validate error message security
- Verify user isolation
- Test unauthorized access attempts
- Validate SQL injection prevention
## 8. Migration Execution
- [ ] 8.1 Pre-migration preparation
- Backup database
- Document rollback procedure
- Prepare user communication
## 11. Migration Execution (Simplified)
- [ ] 11.1 Pre-migration preparation
- Backup existing database (reference only)
- Prepare deployment package
- Set up monitoring
- [ ] 8.2 Staged rollout
- Enable for test users first
- Monitor for issues
- Gradually increase user percentage
- Collect feedback
- [ ] 8.3 Post-migration validation
- Verify all users can login
- Check audit logs
- Monitor error rates
- Validate performance metrics
- [ ] 11.2 Execute migration
- Drop old database tables
- Create new schema
- Deploy new code
- Verify system startup
- [ ] 11.3 Post-migration validation
- Test authentication with real users
- Verify task isolation works
- Check task history functionality
- Validate file access controls
## 9. Documentation
- [ ] 9.1 Technical documentation
- Update API documentation
## 12. Documentation
- [ ] 12.1 Technical documentation
- Update API documentation with new endpoints
- Document authentication flow
- Update deployment guide
- Document task management APIs
- Create troubleshooting guide
- [ ] 9.2 User documentation
- [ ] 12.2 User documentation
- Update login instructions
- Document new features
- Create FAQ for common issues
- [ ] 9.3 Operations documentation
- Document monitoring points
- Create runbook for issues
- Document rollback procedure
- Document task history features
- Explain user isolation
- Create user guide for new UI
- [ ] 12.3 Developer documentation
- Document database schema
- Explain security model
- Provide integration examples
## 10. Monitoring and Observability
- [ ] 10.1 Add monitoring metrics
## 13. Monitoring and Observability
- [ ] 13.1 Add monitoring metrics
- Authentication success/failure rates
- External API response times
- Token refresh success rate
- Error rate monitoring
- [ ] 10.2 Implement logging
- Task creation/completion rates
- User activity metrics
- File storage usage
- [ ] 13.2 Implement logging
- Log all authentication attempts
- Log external API calls
- Log token operations
- Log task operations
- Log file access attempts
- Structured logging for analysis
- [ ] 10.3 Create alerts
- Alert on high failure rates
- Alert on external API unavailability
- Alert on token refresh failures
- Alert on unusual patterns
- [ ] 13.3 Create alerts
- Alert on authentication failures
- Alert on high error rates
- Alert on storage issues
- Alert on performance degradation
## 11. Cleanup (Post-Stabilization)
- [ ] 11.1 Remove legacy code
- Remove local authentication code (after stable period)
- Remove unused database columns
- Clean up configuration
- [ ] 11.2 Optimize performance
- Implement caching where appropriate
- Optimize database queries
- Review and optimize API calls
## 14. Performance Optimization (Post-Launch)
- [ ] 14.1 Database optimization
- Analyze query patterns
- Add missing indexes
- Optimize slow queries
- [ ] 14.2 Caching implementation
- Cache user information
- Cache task lists
- Implement Redis if needed
- [ ] 14.3 File management
- Implement automatic cleanup
- Optimize storage structure
- Add compression if needed