refactor: enhance auth migration proposal with user task isolation
Major updates based on feedback: 1. Remove Azure AD ID storage - use email as primary identifier 2. Complete database redesign - no backward compatibility needed 3. Add comprehensive user task isolation and history features Database changes: - Simplified users table (email-based) - New ocr_tasks table with user association - New task_files table for file tracking - Proper indexes for performance New features: - User task isolation (A cannot see B's tasks) - Task history with status tracking (pending/processing/completed/failed) - Historical query capabilities with filters - Download support for completed tasks - Task management UI with search and filters Security enhancements: - User context validation in all endpoints - File access control based on ownership - Row-level security in database queries - API-level authorization checks Implementation approach: - Clean migration without rollback concerns - Drop old tables and start fresh - Simplified deployment process - Comprehensive task management system 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -72,25 +72,118 @@ By migrating to the external API authentication service at https://pj-auth-api.v
|
||||
```
|
||||
|
||||
### Database Schema Changes
|
||||
- **users table modifications**:
|
||||
- Remove/deprecate `hashed_password` column (keep for rollback)
|
||||
- Add `external_user_id` (VARCHAR 255) - Store Azure AD user ID
|
||||
- Add `display_name` (VARCHAR 255) - Store user display name from API
|
||||
- Add `azure_email` (VARCHAR 255) - Store Azure AD email
|
||||
- Add `last_token_refresh` (DATETIME) - Track token refresh timing
|
||||
- Keep `username` for backward compatibility (can be email)
|
||||
|
||||
**Complete Redesign (No backward compatibility needed)**:
|
||||
|
||||
1. **users table (redesigned)**:
|
||||
```sql
|
||||
CREATE TABLE users (
|
||||
id INT PRIMARY KEY AUTO_INCREMENT,
|
||||
email VARCHAR(255) UNIQUE NOT NULL, -- Primary identifier from Azure AD
|
||||
display_name VARCHAR(255), -- Display name from API response
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
last_login TIMESTAMP,
|
||||
is_active BOOLEAN DEFAULT TRUE
|
||||
);
|
||||
```
|
||||
Note: No Azure AD ID storage needed - email is sufficient as unique identifier
|
||||
|
||||
2. **ocr_tasks table (new - for task history)**:
|
||||
```sql
|
||||
CREATE TABLE ocr_tasks (
|
||||
id INT PRIMARY KEY AUTO_INCREMENT,
|
||||
user_id INT NOT NULL, -- Foreign key to users table
|
||||
task_id VARCHAR(255) UNIQUE, -- Unique task identifier
|
||||
filename VARCHAR(255),
|
||||
file_type VARCHAR(50),
|
||||
status ENUM('pending', 'processing', 'completed', 'failed'),
|
||||
result_json_path VARCHAR(500),
|
||||
result_markdown_path VARCHAR(500),
|
||||
error_message TEXT,
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
|
||||
completed_at TIMESTAMP NULL,
|
||||
file_deleted BOOLEAN DEFAULT FALSE, -- Track if files were auto-deleted
|
||||
FOREIGN KEY (user_id) REFERENCES users(id),
|
||||
INDEX idx_user_status (user_id, status),
|
||||
INDEX idx_created (created_at)
|
||||
);
|
||||
```
|
||||
|
||||
3. **task_files table (for multiple files per task)**:
|
||||
```sql
|
||||
CREATE TABLE task_files (
|
||||
id INT PRIMARY KEY AUTO_INCREMENT,
|
||||
task_id INT NOT NULL,
|
||||
original_name VARCHAR(255),
|
||||
stored_path VARCHAR(500),
|
||||
file_size BIGINT,
|
||||
mime_type VARCHAR(100),
|
||||
FOREIGN KEY (task_id) REFERENCES ocr_tasks(id) ON DELETE CASCADE
|
||||
);
|
||||
```
|
||||
|
||||
### Session Management
|
||||
- Store external API tokens in session/cache instead of local JWT
|
||||
- Implement token refresh mechanism based on `expires_in` field
|
||||
- Use `expiresAt` timestamp for token expiration validation
|
||||
|
||||
## New Features: User Task Isolation and History
|
||||
|
||||
### Task Isolation
|
||||
- **Principle**: Each user can only see and access their own tasks
|
||||
- **Implementation**: All task queries filtered by `user_id` at API level
|
||||
- **Security**: Enforce user context validation in all task-related endpoints
|
||||
|
||||
### Task History Features
|
||||
1. **Task Status Tracking**:
|
||||
- View pending tasks (waiting to process)
|
||||
- View processing tasks (currently running)
|
||||
- View completed tasks (with results available)
|
||||
- View failed tasks (with error messages)
|
||||
|
||||
2. **Historical Query Capabilities**:
|
||||
- Search tasks by filename
|
||||
- Filter by date range
|
||||
- Filter by status
|
||||
- Sort by creation/completion time
|
||||
- Pagination for large result sets
|
||||
|
||||
3. **Task Management**:
|
||||
- Download original files (if not auto-deleted)
|
||||
- Download results (JSON, Markdown, PDF exports)
|
||||
- Re-process failed tasks
|
||||
- Delete old tasks manually
|
||||
|
||||
### Frontend UI Changes
|
||||
1. **New Components**:
|
||||
- Task History page/tab
|
||||
- Task filters and search bar
|
||||
- Task status badges
|
||||
- Batch action controls
|
||||
|
||||
2. **Task List View**:
|
||||
```
|
||||
| Filename | Status | Created | Completed | Actions |
|
||||
|----------|--------|---------|-----------|---------|
|
||||
| doc1.pdf | ✅ Completed | 2025-11-14 10:00 | 2025-11-14 10:05 | [Download] [View] |
|
||||
| doc2.pdf | 🔄 Processing | 2025-11-14 10:10 | - | [Cancel] |
|
||||
| doc3.pdf | ❌ Failed | 2025-11-14 09:00 | - | [Retry] [View Error] |
|
||||
```
|
||||
|
||||
3. **User Information Display**:
|
||||
- Show user display name in header
|
||||
- Show last login time
|
||||
- Show task statistics (total, completed, failed)
|
||||
|
||||
## Impact
|
||||
|
||||
### Affected Capabilities
|
||||
- `authentication`: Complete replacement of authentication mechanism
|
||||
- `user-management`: Simplified to read-only user information from external API
|
||||
- `session-management`: Modified to handle external tokens
|
||||
- `task-management`: NEW - User-specific task isolation and history
|
||||
- `file-access-control`: NEW - User-based file access restrictions
|
||||
|
||||
### Affected Code
|
||||
- **Backend Authentication**:
|
||||
@@ -100,13 +193,26 @@ By migrating to the external API authentication service at https://pj-auth-api.v
|
||||
- `backend/app/services/auth_service.py`: New service for external API integration
|
||||
|
||||
- **Database Models**:
|
||||
- `backend/app/models/user.py`: Update User model with new fields
|
||||
- `backend/alembic/versions/`: New migration for schema changes
|
||||
- `backend/app/models/user.py`: Complete redesign with new schema
|
||||
- `backend/app/models/task.py`: NEW - Task model with user association
|
||||
- `backend/app/models/task_file.py`: NEW - Task file model
|
||||
- `backend/alembic/versions/`: Complete database recreation
|
||||
|
||||
- **Task Management APIs** (NEW):
|
||||
- `backend/app/api/v1/endpoints/tasks.py`: Task CRUD operations with user isolation
|
||||
- `backend/app/api/v1/endpoints/task_history.py`: Historical query endpoints
|
||||
- `backend/app/services/task_service.py`: Task business logic
|
||||
- `backend/app/services/file_access_service.py`: User-based file access control
|
||||
|
||||
- **Frontend**:
|
||||
- `frontend/src/services/authService.ts`: Update to handle new token format
|
||||
- `frontend/src/stores/authStore.ts`: Modify to store/display user info from API
|
||||
- `frontend/src/components/Header.tsx`: Display `name` field instead of username
|
||||
- `frontend/src/components/Header.tsx`: Display `name` field and user menu
|
||||
- `frontend/src/pages/TaskHistory.tsx`: NEW - Task history page
|
||||
- `frontend/src/components/TaskList.tsx`: NEW - Task list component with filters
|
||||
- `frontend/src/components/TaskFilters.tsx`: NEW - Search and filter UI
|
||||
- `frontend/src/stores/taskStore.ts`: NEW - Task state management
|
||||
- `frontend/src/services/taskService.ts`: NEW - Task API client
|
||||
|
||||
### Dependencies
|
||||
- Add `httpx` or `aiohttp` for async HTTP requests to external API (already present)
|
||||
@@ -117,8 +223,10 @@ By migrating to the external API authentication service at https://pj-auth-api.v
|
||||
- `EXTERNAL_AUTH_API_URL` = "https://pj-auth-api.vercel.app"
|
||||
- `EXTERNAL_AUTH_ENDPOINT` = "/api/auth/login"
|
||||
- `EXTERNAL_AUTH_TIMEOUT` = 30 (seconds)
|
||||
- `USE_EXTERNAL_AUTH` = true (feature flag for gradual rollout)
|
||||
- `TOKEN_REFRESH_BUFFER` = 300 (refresh tokens 5 minutes before expiry)
|
||||
- `TASK_RETENTION_DAYS` = 30 (auto-delete old tasks)
|
||||
- `MAX_TASKS_PER_USER` = 1000 (limit per user)
|
||||
- `ENABLE_TASK_HISTORY` = true (enable history feature)
|
||||
|
||||
### Security Considerations
|
||||
- HTTPS required for all authentication requests
|
||||
@@ -127,21 +235,18 @@ By migrating to the external API authentication service at https://pj-auth-api.v
|
||||
- Log all authentication events for audit trail
|
||||
- Validate SSL certificates for external API calls
|
||||
- Handle network failures gracefully with appropriate error messages
|
||||
- **User Isolation**: Enforce user context in all database queries
|
||||
- **File Access Control**: Validate user ownership before file access
|
||||
- **API Security**: Add user_id validation in all task-related endpoints
|
||||
|
||||
### Rollback Strategy
|
||||
- Keep existing authentication code with feature flag
|
||||
- Maintain password column in database (don't drop immediately)
|
||||
- Implement dual authentication mode during transition:
|
||||
- If `USE_EXTERNAL_AUTH=true`: Use external API
|
||||
- If `USE_EXTERNAL_AUTH=false`: Use local authentication
|
||||
- Provide migration script to sync existing users with external system
|
||||
### Migration Plan (Simplified - No Rollback Needed)
|
||||
1. **Phase 1**: Backup existing database (for reference only)
|
||||
2. **Phase 2**: Drop old tables and create new schema
|
||||
3. **Phase 3**: Deploy new authentication and task management system
|
||||
4. **Phase 4**: Test with initial users
|
||||
5. **Phase 5**: Full deployment
|
||||
|
||||
### Migration Plan
|
||||
1. **Phase 1**: Implement external API authentication alongside existing system
|
||||
2. **Phase 2**: Test with subset of users (based on domain or user flag)
|
||||
3. **Phase 3**: Gradual rollout to all users
|
||||
4. **Phase 4**: Deprecate local authentication (keep code for emergency)
|
||||
5. **Phase 5**: Remove local authentication code (after stable period)
|
||||
Note: Since this is a test system with no production data to preserve, we can perform a clean migration without rollback concerns.
|
||||
|
||||
## Risks and Mitigations
|
||||
|
||||
|
||||
@@ -1,32 +1,40 @@
|
||||
# Implementation Tasks
|
||||
|
||||
## 1. Database Schema Updates
|
||||
- [ ] 1.1 Create database migration script
|
||||
- Add `external_user_id` column (VARCHAR 255)
|
||||
- Add `display_name` column (VARCHAR 255)
|
||||
- Add `azure_email` column (VARCHAR 255)
|
||||
- Add `last_token_refresh` column (DATETIME)
|
||||
- Mark `hashed_password` as nullable (for gradual migration)
|
||||
- [ ] 1.2 Update User model
|
||||
- Add new fields to SQLAlchemy model
|
||||
- Update model relationships if needed
|
||||
- Add migration version with Alembic
|
||||
- [ ] 1.3 Create user sync mechanism
|
||||
- Script to map existing users to external IDs
|
||||
- Handle users without external accounts
|
||||
- Backup existing user data
|
||||
## 1. Database Schema Redesign
|
||||
- [ ] 1.1 Backup existing database (for reference)
|
||||
- Export current schema and data
|
||||
- Document any important data to preserve
|
||||
- [ ] 1.2 Drop old tables
|
||||
- Remove existing users table
|
||||
- Remove any related tables
|
||||
- Clear database for fresh start
|
||||
- [ ] 1.3 Create new database schema
|
||||
- Create new `users` table (email as primary identifier)
|
||||
- Create `ocr_tasks` table with user association
|
||||
- Create `task_files` table for file tracking
|
||||
- Add proper indexes for performance
|
||||
- [ ] 1.4 Create SQLAlchemy models
|
||||
- User model (simplified)
|
||||
- Task model with user relationship
|
||||
- TaskFile model with cascade delete
|
||||
- [ ] 1.5 Generate Alembic migration
|
||||
- Create initial migration for new schema
|
||||
- Test migration script
|
||||
|
||||
## 2. Configuration Management
|
||||
- [ ] 2.1 Update environment configuration
|
||||
- Add `EXTERNAL_AUTH_API_URL` to `.env.local`
|
||||
- Add `EXTERNAL_AUTH_ENDPOINT` configuration
|
||||
- Add `EXTERNAL_AUTH_TIMEOUT` setting
|
||||
- Add `USE_EXTERNAL_AUTH` feature flag
|
||||
- Add `TOKEN_REFRESH_BUFFER` setting
|
||||
- Add `TASK_RETENTION_DAYS` for auto-cleanup
|
||||
- Add `MAX_TASKS_PER_USER` for limits
|
||||
- Add `ENABLE_TASK_HISTORY` feature flag
|
||||
- [ ] 2.2 Update Settings class
|
||||
- Add external auth settings to `backend/app/core/config.py`
|
||||
- Add task management settings
|
||||
- Add validation for new configuration values
|
||||
- Implement feature flag logic
|
||||
- Remove old authentication settings
|
||||
|
||||
## 3. External API Integration Service
|
||||
- [ ] 3.1 Create auth API client
|
||||
@@ -99,82 +107,166 @@
|
||||
- Implement retry UI for failures
|
||||
- Add loading states
|
||||
|
||||
## 7. Testing
|
||||
- [ ] 7.1 Unit tests
|
||||
## 7. Task Management System (NEW)
|
||||
- [ ] 7.1 Create task management backend
|
||||
- Implement `backend/app/models/task.py`
|
||||
- Implement `backend/app/models/task_file.py`
|
||||
- Create `backend/app/services/task_service.py`
|
||||
- Add task CRUD operations with user isolation
|
||||
- [ ] 7.2 Implement task APIs
|
||||
- Create `backend/app/api/v1/endpoints/tasks.py`
|
||||
- GET /tasks (list user's tasks with pagination)
|
||||
- GET /tasks/{id} (get specific task)
|
||||
- DELETE /tasks/{id} (delete task)
|
||||
- POST /tasks/{id}/retry (retry failed task)
|
||||
- [ ] 7.3 Create task history endpoints
|
||||
- Create `backend/app/api/v1/endpoints/task_history.py`
|
||||
- GET /history (query with filters)
|
||||
- GET /history/stats (user statistics)
|
||||
- POST /history/export (export history)
|
||||
- [ ] 7.4 Implement file access control
|
||||
- Create `backend/app/services/file_access_service.py`
|
||||
- Validate user ownership before file access
|
||||
- Restrict download to user's own files
|
||||
- Add audit logging for file access
|
||||
- [ ] 7.5 Update OCR service integration
|
||||
- Link OCR tasks to user accounts
|
||||
- Save task records in database
|
||||
- Update task status during processing
|
||||
- Store result file paths
|
||||
|
||||
## 8. Frontend Task Management UI (NEW)
|
||||
- [ ] 8.1 Create task history page
|
||||
- Implement `frontend/src/pages/TaskHistory.tsx`
|
||||
- Display task list with status indicators
|
||||
- Add pagination controls
|
||||
- Show task details modal
|
||||
- [ ] 8.2 Build task list component
|
||||
- Implement `frontend/src/components/TaskList.tsx`
|
||||
- Display task table with columns
|
||||
- Add sorting capabilities
|
||||
- Implement action buttons
|
||||
- [ ] 8.3 Create filter components
|
||||
- Implement `frontend/src/components/TaskFilters.tsx`
|
||||
- Date range picker
|
||||
- Status filter dropdown
|
||||
- Search by filename
|
||||
- Clear filters button
|
||||
- [ ] 8.4 Add task management store
|
||||
- Implement `frontend/src/stores/taskStore.ts`
|
||||
- Manage task list state
|
||||
- Handle filter state
|
||||
- Cache task data
|
||||
- [ ] 8.5 Create task service client
|
||||
- Implement `frontend/src/services/taskService.ts`
|
||||
- API methods for task operations
|
||||
- Handle pagination
|
||||
- Implement retry logic
|
||||
- [ ] 8.6 Update navigation
|
||||
- Add "Task History" menu item
|
||||
- Update router configuration
|
||||
- Add task count badge
|
||||
- Implement user menu with stats
|
||||
|
||||
## 9. User Isolation and Security
|
||||
- [ ] 9.1 Implement user context middleware
|
||||
- Create middleware to inject user context
|
||||
- Validate user in all requests
|
||||
- Add user_id to logging context
|
||||
- [ ] 9.2 Database query isolation
|
||||
- Add user_id filter to all task queries
|
||||
- Prevent cross-user data access
|
||||
- Implement row-level security
|
||||
- [ ] 9.3 File system isolation
|
||||
- Organize files by user directory
|
||||
- Validate file paths before access
|
||||
- Implement cleanup for deleted users
|
||||
- [ ] 9.4 API authorization
|
||||
- Add @require_user decorator
|
||||
- Validate ownership in endpoints
|
||||
- Return 403 for unauthorized access
|
||||
|
||||
## 10. Testing
|
||||
- [ ] 10.1 Unit tests
|
||||
- Test external auth service
|
||||
- Test token validation
|
||||
- Test user information mapping
|
||||
- Test error scenarios
|
||||
- [ ] 7.2 Integration tests
|
||||
- Test task isolation logic
|
||||
- Test file access control
|
||||
- [ ] 10.2 Integration tests
|
||||
- Test full authentication flow
|
||||
- Test token refresh mechanism
|
||||
- Test fallback scenarios
|
||||
- Test feature flag switching
|
||||
- [ ] 7.3 Load testing
|
||||
- Test task management flow
|
||||
- Test user isolation between accounts
|
||||
- Test file download restrictions
|
||||
- [ ] 10.3 Load testing
|
||||
- Test external API response times
|
||||
- Test system under high authentication load
|
||||
- Measure impact on performance
|
||||
- [ ] 7.4 Security testing
|
||||
- Test system with many concurrent users
|
||||
- Test large task history queries
|
||||
- Measure database query performance
|
||||
- [ ] 10.4 Security testing
|
||||
- Test token security
|
||||
- Verify HTTPS enforcement
|
||||
- Test rate limiting
|
||||
- Validate error message security
|
||||
- Verify user isolation
|
||||
- Test unauthorized access attempts
|
||||
- Validate SQL injection prevention
|
||||
|
||||
## 8. Migration Execution
|
||||
- [ ] 8.1 Pre-migration preparation
|
||||
- Backup database
|
||||
- Document rollback procedure
|
||||
- Prepare user communication
|
||||
## 11. Migration Execution (Simplified)
|
||||
- [ ] 11.1 Pre-migration preparation
|
||||
- Backup existing database (reference only)
|
||||
- Prepare deployment package
|
||||
- Set up monitoring
|
||||
- [ ] 8.2 Staged rollout
|
||||
- Enable for test users first
|
||||
- Monitor for issues
|
||||
- Gradually increase user percentage
|
||||
- Collect feedback
|
||||
- [ ] 8.3 Post-migration validation
|
||||
- Verify all users can login
|
||||
- Check audit logs
|
||||
- Monitor error rates
|
||||
- Validate performance metrics
|
||||
- [ ] 11.2 Execute migration
|
||||
- Drop old database tables
|
||||
- Create new schema
|
||||
- Deploy new code
|
||||
- Verify system startup
|
||||
- [ ] 11.3 Post-migration validation
|
||||
- Test authentication with real users
|
||||
- Verify task isolation works
|
||||
- Check task history functionality
|
||||
- Validate file access controls
|
||||
|
||||
## 9. Documentation
|
||||
- [ ] 9.1 Technical documentation
|
||||
- Update API documentation
|
||||
## 12. Documentation
|
||||
- [ ] 12.1 Technical documentation
|
||||
- Update API documentation with new endpoints
|
||||
- Document authentication flow
|
||||
- Update deployment guide
|
||||
- Document task management APIs
|
||||
- Create troubleshooting guide
|
||||
- [ ] 9.2 User documentation
|
||||
- [ ] 12.2 User documentation
|
||||
- Update login instructions
|
||||
- Document new features
|
||||
- Create FAQ for common issues
|
||||
- [ ] 9.3 Operations documentation
|
||||
- Document monitoring points
|
||||
- Create runbook for issues
|
||||
- Document rollback procedure
|
||||
- Document task history features
|
||||
- Explain user isolation
|
||||
- Create user guide for new UI
|
||||
- [ ] 12.3 Developer documentation
|
||||
- Document database schema
|
||||
- Explain security model
|
||||
- Provide integration examples
|
||||
|
||||
## 10. Monitoring and Observability
|
||||
- [ ] 10.1 Add monitoring metrics
|
||||
## 13. Monitoring and Observability
|
||||
- [ ] 13.1 Add monitoring metrics
|
||||
- Authentication success/failure rates
|
||||
- External API response times
|
||||
- Token refresh success rate
|
||||
- Error rate monitoring
|
||||
- [ ] 10.2 Implement logging
|
||||
- Task creation/completion rates
|
||||
- User activity metrics
|
||||
- File storage usage
|
||||
- [ ] 13.2 Implement logging
|
||||
- Log all authentication attempts
|
||||
- Log external API calls
|
||||
- Log token operations
|
||||
- Log task operations
|
||||
- Log file access attempts
|
||||
- Structured logging for analysis
|
||||
- [ ] 10.3 Create alerts
|
||||
- Alert on high failure rates
|
||||
- Alert on external API unavailability
|
||||
- Alert on token refresh failures
|
||||
- Alert on unusual patterns
|
||||
- [ ] 13.3 Create alerts
|
||||
- Alert on authentication failures
|
||||
- Alert on high error rates
|
||||
- Alert on storage issues
|
||||
- Alert on performance degradation
|
||||
|
||||
## 11. Cleanup (Post-Stabilization)
|
||||
- [ ] 11.1 Remove legacy code
|
||||
- Remove local authentication code (after stable period)
|
||||
- Remove unused database columns
|
||||
- Clean up configuration
|
||||
- [ ] 11.2 Optimize performance
|
||||
- Implement caching where appropriate
|
||||
- Optimize database queries
|
||||
- Review and optimize API calls
|
||||
## 14. Performance Optimization (Post-Launch)
|
||||
- [ ] 14.1 Database optimization
|
||||
- Analyze query patterns
|
||||
- Add missing indexes
|
||||
- Optimize slow queries
|
||||
- [ ] 14.2 Caching implementation
|
||||
- Cache user information
|
||||
- Cache task lists
|
||||
- Implement Redis if needed
|
||||
- [ ] 14.3 File management
|
||||
- Implement automatic cleanup
|
||||
- Optimize storage structure
|
||||
- Add compression if needed
|
||||
Reference in New Issue
Block a user