Files
OCR/openspec/changes/archive/2025-12-14-add-storage-cleanup/specs/task-management/spec.md
egg 73112db055 feat: add storage cleanup mechanism with soft delete and auto scheduler
- Add soft delete (deleted_at column) to preserve task records for statistics
- Implement cleanup service to delete old files while keeping DB records
- Add automatic cleanup scheduler (configurable interval, default 24h)
- Add admin endpoints: storage stats, cleanup trigger, scheduler status
- Update task service with admin views (include deleted/files_deleted)
- Add frontend storage management UI in admin dashboard
- Add i18n translations for storage management

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 12:41:01 +08:00

5.5 KiB

task-management Spec Delta

ADDED Requirements

Requirement: Soft Delete Tasks

The system SHALL support soft deletion of tasks, marking them as deleted without removing database records to preserve usage statistics.

Scenario: User soft deletes a task

  • WHEN user calls DELETE on /api/v2/tasks/{task_id}
  • THEN system SHALL set deleted_at timestamp on the task record
  • AND system SHALL NOT delete the actual files
  • AND system SHALL NOT remove the database record
  • AND subsequent user queries SHALL NOT return this task

Scenario: Preserve statistics after soft delete

  • WHEN a task is soft deleted
  • THEN admin statistics endpoints SHALL continue to include this task's metrics
  • AND translation token counts SHALL remain in cumulative totals
  • AND processing time statistics SHALL remain accurate

Requirement: File Cleanup Scheduler

The system SHALL automatically clean up old files while preserving database records for statistics tracking.

Scenario: Scheduled file cleanup

  • WHEN cleanup scheduler runs (configurable interval, default daily)
  • THEN system SHALL identify tasks where files can be deleted
  • AND system SHALL retain newest N files per user (configurable, default 50)
  • AND system SHALL delete actual files from disk for older tasks
  • AND system SHALL set file_deleted=True on cleaned tasks
  • AND system SHALL NOT delete any database records

Scenario: File retention per user

  • WHEN user has more than max_files_per_user tasks with files
  • THEN cleanup SHALL delete files for oldest tasks exceeding the limit
  • AND cleanup SHALL preserve the newest max_files_per_user task files
  • AND task ordering SHALL be by created_at descending

Scenario: Manual cleanup trigger

  • WHEN admin calls POST /api/v2/admin/cleanup/trigger
  • THEN system SHALL immediately run the cleanup process
  • AND return summary of files deleted and space freed

Requirement: Admin Task Visibility

Admin users SHALL have full visibility into all tasks including soft-deleted and file-cleaned tasks.

Scenario: Admin lists all tasks

  • WHEN admin calls GET /api/v2/admin/tasks
  • THEN response SHALL include all tasks from all users
  • AND response SHALL include soft-deleted tasks
  • AND response SHALL include tasks with deleted files
  • AND each task SHALL indicate its deletion status

Scenario: Filter admin task list

  • WHEN admin calls GET /api/v2/admin/tasks with filters
  • THEN include_deleted=false SHALL exclude soft-deleted tasks
  • AND include_files_deleted=false SHALL exclude file-cleaned tasks
  • AND user_id={id} SHALL filter to specific user's tasks

Scenario: View storage usage statistics

  • WHEN admin calls GET /api/v2/admin/storage/stats
  • THEN response SHALL include total storage used
  • AND response SHALL include per-user storage breakdown
  • AND response SHALL include count of tasks with/without files

Requirement: User Task Isolation

Regular users SHALL only see their own tasks and soft-deleted tasks SHALL be hidden from their view.

Scenario: User lists own tasks

  • WHEN authenticated user calls GET /api/v2/tasks
  • THEN response SHALL only include tasks owned by that user
  • AND response SHALL NOT include soft-deleted tasks
  • AND response SHALL include tasks with deleted files (showing file unavailable status)

Scenario: User cannot access other user's tasks

  • WHEN user attempts to access task owned by another user
  • THEN system SHALL return 404 Not Found
  • AND system SHALL NOT reveal that the task exists

MODIFIED Requirements

Requirement: Task Detail View

The frontend SHALL provide a dedicated page for viewing individual task details with processing track information, enhanced preview capabilities, and file availability status.

Scenario: Navigate to task detail page

  • WHEN user clicks "View Details" button on task in Task History page
  • THEN browser SHALL navigate to /tasks/{task_id}
  • AND TaskDetailPage component SHALL render

Scenario: Display task information

  • WHEN TaskDetailPage loads for a valid task ID
  • THEN page SHALL display task metadata (filename, status, processing time, confidence)
  • AND page SHALL show markdown preview of OCR results
  • AND page SHALL provide download buttons for JSON, Markdown, and PDF formats

Scenario: Download from task detail page

  • WHEN user clicks download button for a specific format
  • THEN browser SHALL download the file using /api/v2/tasks/{task_id}/download/{format} endpoint
  • AND downloaded file SHALL contain the task's OCR results in requested format

Scenario: Display processing track information

  • WHEN viewing task processed through dual-track system
  • THEN page SHALL display processing track used (OCR or Direct)
  • AND show track-specific metrics (OCR confidence or extraction quality)
  • AND provide option to reprocess with alternate track if applicable

Scenario: Preview document structure

  • WHEN user enables structure view
  • THEN page SHALL display document element hierarchy
  • AND show bounding boxes overlay on preview
  • AND highlight different element types (headers, tables, lists) with distinct colors

Scenario: Display file unavailable status

  • WHEN task has file_deleted=True
  • THEN page SHALL show file unavailable indicator
  • AND download buttons SHALL be disabled or hidden
  • AND page SHALL display explanation that files were cleaned up