- Add soft delete (deleted_at column) to preserve task records for statistics
- Implement cleanup service to delete old files while keeping DB records
- Add automatic cleanup scheduler (configurable interval, default 24h)
- Add admin endpoints: storage stats, cleanup trigger, scheduler status
- Update task service with admin views (include deleted/files_deleted)
- Add frontend storage management UI in admin dashboard
- Add i18n translations for storage management
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Backend changes:
- Disable PP-Structure debug file generation by default
- Separate raw_ocr_regions.json generation from debug flag (critical file)
- Add visualization folder download endpoint as ZIP
- Add has_visualization field to TaskDetailResponse
- Stop generating Markdown files
- Save translated PDFs to task folder with caching
Frontend changes:
- Replace JSON/MD download buttons with PDF buttons in TaskHistoryPage
- Add visualization download button in TaskDetailPage
- Fix Processing page task switching issue (reset isNotFound)
Archives two OpenSpec proposals:
- optimize-task-files-and-visualization
- simplify-frontend-add-billing
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add TranslationLog model to track translation API usage per task
- Integrate Dify API actual price (total_price) into translation stats
- Display translation statistics in admin dashboard with per-task costs
- Remove unused Export and Settings pages to simplify frontend
- Add GET /api/v2/admin/translation-stats endpoint
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Backend cleanup:
- Remove ocr_service_original.py (legacy OCR service, replaced by ocr_service.py)
- Remove preprocessor.py (unused, functionality absorbed by layout_preprocessing_service.py)
- Remove pdf_font_manager.py (unused, never referenced by any service)
Frontend cleanup:
- Remove MarkdownPreview.tsx (unused component)
- Remove ResultsTable.tsx (unused, replaced by TaskHistoryPage)
- Remove services/api.ts (legacy API client, migrated to apiV2)
- Remove types/api.ts (legacy types, migrated to apiV2.ts)
API migration:
- Add export rules CRUD methods to apiClientV2
- Update SettingsPage.tsx to use apiClientV2
- Update Layout.tsx to use only apiClientV2 for logout
This reduces ~1,500 lines of redundant code and unifies the API client.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add generate_translated_layout_pdf() method for layout-preserving translated PDFs
- Add generate_translated_pdf() method for reflow translated PDFs
- Update translate router to accept format parameter (layout/reflow)
- Update frontend with dropdown to select translated PDF format
- Fix reflow PDF table cell extraction from content dict
- Add embedded images handling in reflow PDF tables
- Archive improve-translated-text-fitting openspec proposal
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Adds the ability to download translated documents as PDF files while
preserving the original document layout. Key changes:
- Add apply_translations() function to merge translation JSON with UnifiedDocument
- Add generate_translated_pdf() method to PDFGeneratorService
- Add POST /api/v2/translate/{task_id}/pdf endpoint
- Add downloadTranslatedPdf() method and PDF button in frontend
- Add comprehensive unit tests (52 tests: merge, PDF generation, API endpoints)
- Archive add-translated-pdf-export proposal
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Implement document translation feature using DIFY AI API with batch processing:
Backend:
- Add DIFY client with batch translation support (5000 chars, 20 items per batch)
- Add translation service with element extraction and result building
- Add translation router with start/status/result/list/delete endpoints
- Add translation schemas (TranslationRequest, TranslationStatus, etc.)
Frontend:
- Enable translation UI in TaskDetailPage
- Add translation API methods to apiV2.ts
- Add translation types
Features:
- Batch translation with numbered markers [1], [2], [3]...
- Support for text, title, header, footer, paragraph, footnote, table cells
- Translation result JSON with statistics (tokens, latency, batch_count)
- Background task processing with progress tracking
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Create PreprocessingPreview component with debounced config updates
- Show original vs preprocessed images side-by-side
- Display image quality metrics (contrast, sharpness) with quality indicators
- Add zoom controls and fullscreen view for detailed inspection
- Show auto-detected configuration when in auto mode
- Integrate preview toggle with PreprocessingSettings component
- Add i18n translations for preview panel UI
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Implement user-configurable PP-StructureV3 parameters to allow fine-tuning OCR behavior
from the frontend. This addresses issues with over-merging, missing small text, and
document-specific optimization needs.
Backend:
- Add PPStructureV3Params schema with 7 adjustable parameters
- Update OCR service to accept custom parameters with smart caching
- Modify /tasks/{task_id}/start endpoint to receive params in request body
- Parameter priority: custom > settings default
- Conditional caching (no cache for custom params to avoid pollution)
Frontend:
- Create PPStructureParams component with collapsible UI
- Add 3 presets: default, high-quality, fast
- Implement localStorage persistence for user parameters
- Add import/export JSON functionality
- Integrate into ProcessingPage with conditional rendering
Testing:
- Unit tests: 7/10 passing (core functionality verified)
- API integration tests for schema validation
- E2E tests with authentication support
- Performance benchmarks for memory and initialization
- Test runner script with venv activation
Environment:
- Remove duplicate backend/venv (use root venv only)
- Update test runner to use correct virtual environment
OpenSpec:
- Archive fix-pdf-coordinate-system proposal
- Archive frontend-adjustable-ppstructure-params proposal
- Create ocr-processing spec
- Update result-export spec
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Backend fixes:
- Fix markdown generation using correct 'markdown_content' key in tasks.py
- Update admin service to return flat data structure matching frontend types
- Add task_count and failed_tasks fields to user statistics
- Fix top users endpoint to return complete user data
Frontend fixes:
- Migrate ResultsPage from V1 batch API to V2 task API with polling
- Create TaskDetailPage component with markdown preview and download buttons
- Refactor ExportPage to support multi-task selection using V2 download endpoints
- Fix login infinite refresh loop with concurrency control flags
- Create missing Checkbox UI component
New features:
- Add /tasks/:taskId route for task detail view
- Implement multi-task batch export functionality
- Add real-time task status polling (2s interval)
OpenSpec:
- Archive completed proposal 2025-11-17-fix-v2-api-ui-issues
- Create result-export and task-management specifications
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Changes:
- Add uploadFile() method to apiClientV2 for single file uploads
- Update UploadPage to use apiClientV2 instead of apiClient
- Change upload logic to iterate files and collect task IDs
- Add navigation to /login after logout in Layout component
Fixes:
- 403 Forbidden error on file upload (token mismatch between V1/V2 APIs)
- Logout button not redirecting to login page after clearing auth
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Frontend Features:
- Add ProtectedRoute component with token expiry validation
- Create AdminDashboardPage with system statistics and user management
- Create AuditLogsPage with filtering and pagination
- Add admin-only navigation (Shield icon) for ymirliu@panjit.com.tw
- Add admin API methods to apiV2 service
- Add admin type definitions (SystemStats, AuditLog, etc.)
Token Management:
- Auto-redirect to login on token expiry
- Check authentication on route change
- Show loading state during auth check
- Admin privilege verification
Backend Testing:
- Add pytest configuration (pytest.ini)
- Create test fixtures (conftest.py)
- Add unit tests for auth, tasks, and admin endpoints
- Add integration tests for complete workflows
- Test user isolation and admin access control
Documentation:
- Add TESTING.md with comprehensive testing guide
- Include test running instructions
- Document fixtures and best practices
Routes:
- /admin - Admin dashboard (admin only)
- /admin/audit-logs - Audit logs viewer (admin only)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Remove all V1 architecture components and promote V2 to primary:
- Delete all paddle_ocr_* table models (export, ocr, translation, user)
- Delete legacy routers (auth, export, ocr, translation)
- Delete legacy schemas and services
- Promote user_v2.py to user.py as primary user model
- Update all imports and dependencies to use V2 models only
- Update main.py version to 2.0.0
Database changes:
- Fix SQLAlchemy reserved word: rename audit_log.metadata to extra_data
- Add migration to drop all paddle_ocr_* tables
- Update alembic env to only import V2 models
Frontend fixes:
- Fix Select component exports in TaskHistoryPage.tsx
- Update to use simplified Select API with options prop
- Fix AxiosInstance TypeScript import syntax
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>