feat: add storage cleanup mechanism with soft delete and auto scheduler
- Add soft delete (deleted_at column) to preserve task records for statistics - Implement cleanup service to delete old files while keeping DB records - Add automatic cleanup scheduler (configurable interval, default 24h) - Add admin endpoints: storage stats, cleanup trigger, scheduler status - Update task service with admin views (include deleted/files_deleted) - Add frontend storage management UI in admin dashboard - Add i18n translations for storage management 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,60 @@
|
||||
# Change: Add Storage Cleanup Mechanism
|
||||
|
||||
## Why
|
||||
目前系統缺乏完整的磁碟空間管理機制:
|
||||
- `delete_task` 只刪除資料庫記錄,不刪除實際檔案
|
||||
- `auto_cleanup_expired_tasks` 存在但從未被調用
|
||||
- 上傳檔案 (uploads/) 和結果檔案 (storage/results/) 會無限累積
|
||||
|
||||
用戶需要:
|
||||
1. 定期清理過期檔案以節省磁碟空間
|
||||
2. 保留資料庫記錄以便管理員查看累計統計(TOKEN、成本、用量)
|
||||
3. 軟刪除機制讓用戶可以「刪除」任務但不影響統計
|
||||
|
||||
## What Changes
|
||||
|
||||
### Backend Changes
|
||||
1. **Task Model 擴展**
|
||||
- 新增 `deleted_at` 欄位實現軟刪除
|
||||
- 保留現有 `file_deleted` 欄位追蹤檔案清理狀態
|
||||
|
||||
2. **Task Service 更新**
|
||||
- `delete_task()` 改為軟刪除(設置 `deleted_at`,不刪檔案)
|
||||
- 用戶查詢自動過濾 `deleted_at IS NOT NULL` 的記錄
|
||||
- 新增 `cleanup_expired_files()` 方法清理過期檔案
|
||||
|
||||
3. **Cleanup Service 新增**
|
||||
- 定期排程任務(可配置間隔,建議每日)
|
||||
- 清理邏輯:每用戶保留最新 N 筆任務的檔案(預設 50)
|
||||
- 只刪除檔案,不刪除資料庫記錄(保留統計數據)
|
||||
|
||||
4. **Admin Endpoints 擴展**
|
||||
- 新增 `/api/v2/admin/tasks` 端點:查看所有任務(含已刪除)
|
||||
- 支援過濾:`include_deleted=true/false`、`include_files_deleted=true/false`
|
||||
|
||||
### Frontend Changes
|
||||
5. **Task History Page**
|
||||
- 用戶只看到自己的任務(已有 user_id 隔離)
|
||||
- 軟刪除的任務不顯示在列表中
|
||||
|
||||
6. **Admin Dashboard**
|
||||
- 新增任務管理視圖
|
||||
- 顯示所有任務含狀態標記(已刪除、檔案已清理)
|
||||
- 可查看累計統計不受刪除影響
|
||||
|
||||
### Configuration
|
||||
7. **Config 新增設定項**
|
||||
- `cleanup_interval_hours`: 清理間隔(預設 24)
|
||||
- `max_files_per_user`: 每用戶保留最新檔案數(預設 50)
|
||||
- `cleanup_enabled`: 是否啟用自動清理(預設 true)
|
||||
|
||||
## Impact
|
||||
- Affected specs: `task-management`
|
||||
- Affected code:
|
||||
- `backend/app/models/task.py` - 新增 deleted_at 欄位
|
||||
- `backend/app/services/task_service.py` - 軟刪除和查詢邏輯
|
||||
- `backend/app/services/cleanup_service.py` - 新檔案
|
||||
- `backend/app/routers/admin.py` - 新增端點
|
||||
- `backend/app/core/config.py` - 新增設定
|
||||
- `frontend/src/pages/AdminDashboardPage.tsx` - 任務管理視圖
|
||||
- Database migration required: 新增 `deleted_at` 欄位
|
||||
@@ -0,0 +1,116 @@
|
||||
# task-management Spec Delta
|
||||
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Soft Delete Tasks
|
||||
The system SHALL support soft deletion of tasks, marking them as deleted without removing database records to preserve usage statistics.
|
||||
|
||||
#### Scenario: User soft deletes a task
|
||||
- **WHEN** user calls DELETE on `/api/v2/tasks/{task_id}`
|
||||
- **THEN** system SHALL set `deleted_at` timestamp on the task record
|
||||
- **AND** system SHALL NOT delete the actual files
|
||||
- **AND** system SHALL NOT remove the database record
|
||||
- **AND** subsequent user queries SHALL NOT return this task
|
||||
|
||||
#### Scenario: Preserve statistics after soft delete
|
||||
- **WHEN** a task is soft deleted
|
||||
- **THEN** admin statistics endpoints SHALL continue to include this task's metrics
|
||||
- **AND** translation token counts SHALL remain in cumulative totals
|
||||
- **AND** processing time statistics SHALL remain accurate
|
||||
|
||||
### Requirement: File Cleanup Scheduler
|
||||
The system SHALL automatically clean up old files while preserving database records for statistics tracking.
|
||||
|
||||
#### Scenario: Scheduled file cleanup
|
||||
- **WHEN** cleanup scheduler runs (configurable interval, default daily)
|
||||
- **THEN** system SHALL identify tasks where files can be deleted
|
||||
- **AND** system SHALL retain newest N files per user (configurable, default 50)
|
||||
- **AND** system SHALL delete actual files from disk for older tasks
|
||||
- **AND** system SHALL set `file_deleted=True` on cleaned tasks
|
||||
- **AND** system SHALL NOT delete any database records
|
||||
|
||||
#### Scenario: File retention per user
|
||||
- **WHEN** user has more than `max_files_per_user` tasks with files
|
||||
- **THEN** cleanup SHALL delete files for oldest tasks exceeding the limit
|
||||
- **AND** cleanup SHALL preserve the newest `max_files_per_user` task files
|
||||
- **AND** task ordering SHALL be by `created_at` descending
|
||||
|
||||
#### Scenario: Manual cleanup trigger
|
||||
- **WHEN** admin calls POST `/api/v2/admin/cleanup/trigger`
|
||||
- **THEN** system SHALL immediately run the cleanup process
|
||||
- **AND** return summary of files deleted and space freed
|
||||
|
||||
### Requirement: Admin Task Visibility
|
||||
Admin users SHALL have full visibility into all tasks including soft-deleted and file-cleaned tasks.
|
||||
|
||||
#### Scenario: Admin lists all tasks
|
||||
- **WHEN** admin calls GET `/api/v2/admin/tasks`
|
||||
- **THEN** response SHALL include all tasks from all users
|
||||
- **AND** response SHALL include soft-deleted tasks
|
||||
- **AND** response SHALL include tasks with deleted files
|
||||
- **AND** each task SHALL indicate its deletion status
|
||||
|
||||
#### Scenario: Filter admin task list
|
||||
- **WHEN** admin calls GET `/api/v2/admin/tasks` with filters
|
||||
- **THEN** `include_deleted=false` SHALL exclude soft-deleted tasks
|
||||
- **AND** `include_files_deleted=false` SHALL exclude file-cleaned tasks
|
||||
- **AND** `user_id={id}` SHALL filter to specific user's tasks
|
||||
|
||||
#### Scenario: View storage usage statistics
|
||||
- **WHEN** admin calls GET `/api/v2/admin/storage/stats`
|
||||
- **THEN** response SHALL include total storage used
|
||||
- **AND** response SHALL include per-user storage breakdown
|
||||
- **AND** response SHALL include count of tasks with/without files
|
||||
|
||||
### Requirement: User Task Isolation
|
||||
Regular users SHALL only see their own tasks and soft-deleted tasks SHALL be hidden from their view.
|
||||
|
||||
#### Scenario: User lists own tasks
|
||||
- **WHEN** authenticated user calls GET `/api/v2/tasks`
|
||||
- **THEN** response SHALL only include tasks owned by that user
|
||||
- **AND** response SHALL NOT include soft-deleted tasks
|
||||
- **AND** response SHALL include tasks with deleted files (showing file unavailable status)
|
||||
|
||||
#### Scenario: User cannot access other user's tasks
|
||||
- **WHEN** user attempts to access task owned by another user
|
||||
- **THEN** system SHALL return 404 Not Found
|
||||
- **AND** system SHALL NOT reveal that the task exists
|
||||
|
||||
## MODIFIED Requirements
|
||||
|
||||
### Requirement: Task Detail View
|
||||
The frontend SHALL provide a dedicated page for viewing individual task details with processing track information, enhanced preview capabilities, and file availability status.
|
||||
|
||||
#### Scenario: Navigate to task detail page
|
||||
- **WHEN** user clicks "View Details" button on task in Task History page
|
||||
- **THEN** browser SHALL navigate to `/tasks/{task_id}`
|
||||
- **AND** TaskDetailPage component SHALL render
|
||||
|
||||
#### Scenario: Display task information
|
||||
- **WHEN** TaskDetailPage loads for a valid task ID
|
||||
- **THEN** page SHALL display task metadata (filename, status, processing time, confidence)
|
||||
- **AND** page SHALL show markdown preview of OCR results
|
||||
- **AND** page SHALL provide download buttons for JSON, Markdown, and PDF formats
|
||||
|
||||
#### Scenario: Download from task detail page
|
||||
- **WHEN** user clicks download button for a specific format
|
||||
- **THEN** browser SHALL download the file using `/api/v2/tasks/{task_id}/download/{format}` endpoint
|
||||
- **AND** downloaded file SHALL contain the task's OCR results in requested format
|
||||
|
||||
#### Scenario: Display processing track information
|
||||
- **WHEN** viewing task processed through dual-track system
|
||||
- **THEN** page SHALL display processing track used (OCR or Direct)
|
||||
- **AND** show track-specific metrics (OCR confidence or extraction quality)
|
||||
- **AND** provide option to reprocess with alternate track if applicable
|
||||
|
||||
#### Scenario: Preview document structure
|
||||
- **WHEN** user enables structure view
|
||||
- **THEN** page SHALL display document element hierarchy
|
||||
- **AND** show bounding boxes overlay on preview
|
||||
- **AND** highlight different element types (headers, tables, lists) with distinct colors
|
||||
|
||||
#### Scenario: Display file unavailable status
|
||||
- **WHEN** task has `file_deleted=True`
|
||||
- **THEN** page SHALL show file unavailable indicator
|
||||
- **AND** download buttons SHALL be disabled or hidden
|
||||
- **AND** page SHALL display explanation that files were cleaned up
|
||||
@@ -0,0 +1,49 @@
|
||||
# Tasks: Add Storage Cleanup Mechanism
|
||||
|
||||
## 1. Database Schema
|
||||
- [x] 1.1 Add `deleted_at` column to Task model
|
||||
- [x] 1.2 Create database migration for deleted_at column
|
||||
- [x] 1.3 Run migration and verify column exists
|
||||
|
||||
## 2. Task Service Updates
|
||||
- [x] 2.1 Update `delete_task()` to set `deleted_at` instead of deleting record
|
||||
- [x] 2.2 Update `get_tasks()` to filter out soft-deleted tasks for regular users
|
||||
- [x] 2.3 Update `get_task_by_id()` to respect soft delete for regular users
|
||||
- [x] 2.4 Add `get_all_tasks()` method for admin (includes deleted)
|
||||
|
||||
## 3. Cleanup Service
|
||||
- [x] 3.1 Create `cleanup_service.py` with file cleanup logic
|
||||
- [x] 3.2 Implement per-user file retention (keep newest N files)
|
||||
- [x] 3.3 Add method to calculate storage usage per user
|
||||
- [x] 3.4 Set `file_deleted=True` after cleaning files
|
||||
|
||||
## 4. Scheduled Cleanup Task
|
||||
- [x] 4.1 Add cleanup configuration to `config.py`
|
||||
- [x] 4.2 Create scheduler for periodic cleanup
|
||||
- [x] 4.3 Add startup hook to register cleanup task
|
||||
- [x] 4.4 Add manual cleanup trigger endpoint for admin
|
||||
|
||||
## 5. Admin API Endpoints
|
||||
- [x] 5.1 Add `GET /api/v2/admin/tasks` endpoint
|
||||
- [x] 5.2 Support filters: `include_deleted`, `include_files_deleted`, `user_id`
|
||||
- [x] 5.3 Add pagination support
|
||||
- [x] 5.4 Add storage usage statistics endpoint
|
||||
|
||||
## 6. Frontend Updates
|
||||
- [x] 6.1 Verify TaskHistoryPage correctly filters by user (existing user_id isolation)
|
||||
- [x] 6.2 Add admin task management view to AdminDashboardPage
|
||||
- [x] 6.3 Display soft-deleted and files-cleaned status badges (i18n ready)
|
||||
- [x] 6.4 Add i18n keys for new UI elements
|
||||
|
||||
## 7. Testing
|
||||
- [x] 7.1 Test soft delete preserves database record (code verified)
|
||||
- [x] 7.2 Test user isolation (users see only own tasks - existing)
|
||||
- [x] 7.3 Test admin sees all tasks including deleted (API verified)
|
||||
- [x] 7.4 Test file cleanup retains newest N files (code verified)
|
||||
- [x] 7.5 Test storage statistics calculation (API verified)
|
||||
|
||||
## Notes
|
||||
- All tasks completed including automatic scheduler
|
||||
- Cleanup runs automatically at configured interval (default: 24 hours)
|
||||
- Manual cleanup trigger is also available via admin endpoint
|
||||
- Scheduler status can be checked via `GET /api/v2/admin/cleanup/status`
|
||||
Reference in New Issue
Block a user