Files
OCR/openspec/changes/archive/2025-12-02-add-document-translation/tasks.md
egg 8d9b69ba93 feat: add document translation via DIFY AI API
Implement document translation feature using DIFY AI API with batch processing:

Backend:
- Add DIFY client with batch translation support (5000 chars, 20 items per batch)
- Add translation service with element extraction and result building
- Add translation router with start/status/result/list/delete endpoints
- Add translation schemas (TranslationRequest, TranslationStatus, etc.)

Frontend:
- Enable translation UI in TaskDetailPage
- Add translation API methods to apiV2.ts
- Add translation types

Features:
- Batch translation with numbered markers [1], [2], [3]...
- Support for text, title, header, footer, paragraph, footnote, table cells
- Translation result JSON with statistics (tokens, latency, batch_count)
- Background task processing with progress tracking

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-02 11:57:02 +08:00

122 lines
4.3 KiB
Markdown

# Implementation Tasks
## 1. Backend - DIFY Client
- [x] 1.1 Create DIFY client (`backend/app/services/dify_client.py`)
- HTTP client with httpx
- Base URL: `https://dify.theaken.com/v1`
- API Key configuration
- `translate(text, target_lang)` and `translate_batch(texts, target_lang)` methods
- Error handling and retry logic (3 retries, exponential backoff)
- [x] 1.2 Add translation prompt template
- Format: "Translate the following text to {language}. Return ONLY the translated text, no explanations.\n\n{text}"
- Batch format with numbered markers [1], [2], [3]...
- Language name mapping (en → English, zh-TW → Traditional Chinese, etc.)
## 2. Backend - Translation Service
- [x] 2.1 Rewrite translation service (`backend/app/services/translation_service.py`)
- Use DIFY client instead of local model
- Element extraction from UnifiedDocument (all track types)
- Batch translation (MAX_BATCH_CHARS=5000, MAX_BATCH_ITEMS=20)
- Result parsing and element_id mapping
- [x] 2.2 Create translation result JSON writer
- Schema version, metadata, translations dict
- Table cell handling with row/col positions
- Save to `{task_id}_translated_{lang}.json`
- Include usage statistics (tokens, latency, batch_count)
- [x] 2.3 Add translatable element type handling
- Text types: `text`, `title`, `header`, `footer`, `paragraph`, `footnote`
- Table: Extract and translate `cells[].content`
- Skip: `page_number`, `image`, `chart`, `logo`, `reference`
## 3. Backend - API Endpoints
- [x] 3.1 Create/Update translation router (`backend/app/routers/translate.py`)
- POST `/api/v2/translate/{task_id}` - Start translation
- GET `/api/v2/translate/{task_id}/status` - Get progress
- GET `/api/v2/translate/{task_id}/result` - Get translation result
- GET `/api/v2/translate/{task_id}/translations` - List available translations
- DELETE `/api/v2/translate/{task_id}/translations/{lang}` - Delete translation
- [x] 3.2 Implement background task processing
- Use FastAPI BackgroundTasks for async translation
- Status tracking (pending, translating, completed, failed)
- Progress reporting (current element / total elements)
- [x] 3.3 Add translation schemas (`backend/app/schemas/translation.py`)
- TranslationRequest (task_id, target_lang)
- TranslationStatusResponse (status, progress, error)
- TranslationListResponse (translations, statistics)
- [x] 3.4 Register router in main app
## 4. Frontend - UI Updates
- [x] 4.1 Enable translation UI in TaskDetailPage
- Translation state management
- Language selector connected to state
- [x] 4.2 Add translation progress display
- Progress tracking
- Status polling (translating element X/Y)
- Error handling and display
- [x] 4.3 Update API service
- Implement startTranslation method
- Add polling for translation status
- Handle translation result
- [x] 4.4 Add translation complete state
- Show success message
- Display available translated versions
## 5. Testing
Use existing JSON files in `backend/storage/results/` for testing.
Available test samples:
- Direct track: `1c94bfbf-*/edit_result.json`, `8eedd9ed-*/ppt_result.json`
- OCR track: `c85fff69-*/scan_result.json`, `ca2b59a3-*/img3_result.json`
- Hybrid track: `1484ba43-*/edit2_result.json`
- [x] 5.1 Unit tests for DIFY client
- Test with real API calls (no mocks)
- Test retry logic on timeout
- [x] 5.2 Unit tests for translation service
- Element extraction from existing result.json files (10 tests pass)
- Result parsing and element_id mapping
- Table cell extraction and translation
- [x] 5.3 Integration tests for API endpoints
- Start translation with existing task_id
- Status polling during translation
- Result retrieval after completion
- [x] 5.4 Manual E2E verification
- Translate Direct track document (edit_result.json → zh-TW) ✓
- Verified translation quality and JSON structure
## 6. Configuration
- [x] 6.1 Add DIFY configuration (hardcoded in dify_client.py)
- `DIFY_BASE_URL`: https://dify.theaken.com/v1
- `DIFY_API_KEY`: app-YOPrF2ro5fshzMkCZviIuUJd
- `DIFY_TIMEOUT`: 120 seconds
- `DIFY_MAX_RETRIES`: 3
- `MAX_BATCH_CHARS`: 5000
- `MAX_BATCH_ITEMS`: 20
## 7. Documentation
- [ ] 7.1 Update API documentation
- Add translation endpoints to OpenAPI spec
- [ ] 7.2 Add DIFY setup instructions
- API key configuration
- Rate limiting considerations