Files
OCR/openspec/changes/archive/2025-12-02-add-document-translation/tasks.md
egg 8d9b69ba93 feat: add document translation via DIFY AI API
Implement document translation feature using DIFY AI API with batch processing:

Backend:
- Add DIFY client with batch translation support (5000 chars, 20 items per batch)
- Add translation service with element extraction and result building
- Add translation router with start/status/result/list/delete endpoints
- Add translation schemas (TranslationRequest, TranslationStatus, etc.)

Frontend:
- Enable translation UI in TaskDetailPage
- Add translation API methods to apiV2.ts
- Add translation types

Features:
- Batch translation with numbered markers [1], [2], [3]...
- Support for text, title, header, footer, paragraph, footnote, table cells
- Translation result JSON with statistics (tokens, latency, batch_count)
- Background task processing with progress tracking

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-02 11:57:02 +08:00

4.3 KiB

Implementation Tasks

1. Backend - DIFY Client

  • 1.1 Create DIFY client (backend/app/services/dify_client.py)

    • HTTP client with httpx
    • Base URL: https://dify.theaken.com/v1
    • API Key configuration
    • translate(text, target_lang) and translate_batch(texts, target_lang) methods
    • Error handling and retry logic (3 retries, exponential backoff)
  • 1.2 Add translation prompt template

    • Format: "Translate the following text to {language}. Return ONLY the translated text, no explanations.\n\n{text}"
    • Batch format with numbered markers [1], [2], [3]...
    • Language name mapping (en → English, zh-TW → Traditional Chinese, etc.)

2. Backend - Translation Service

  • 2.1 Rewrite translation service (backend/app/services/translation_service.py)

    • Use DIFY client instead of local model
    • Element extraction from UnifiedDocument (all track types)
    • Batch translation (MAX_BATCH_CHARS=5000, MAX_BATCH_ITEMS=20)
    • Result parsing and element_id mapping
  • 2.2 Create translation result JSON writer

    • Schema version, metadata, translations dict
    • Table cell handling with row/col positions
    • Save to {task_id}_translated_{lang}.json
    • Include usage statistics (tokens, latency, batch_count)
  • 2.3 Add translatable element type handling

    • Text types: text, title, header, footer, paragraph, footnote
    • Table: Extract and translate cells[].content
    • Skip: page_number, image, chart, logo, reference

3. Backend - API Endpoints

  • 3.1 Create/Update translation router (backend/app/routers/translate.py)

    • POST /api/v2/translate/{task_id} - Start translation
    • GET /api/v2/translate/{task_id}/status - Get progress
    • GET /api/v2/translate/{task_id}/result - Get translation result
    • GET /api/v2/translate/{task_id}/translations - List available translations
    • DELETE /api/v2/translate/{task_id}/translations/{lang} - Delete translation
  • 3.2 Implement background task processing

    • Use FastAPI BackgroundTasks for async translation
    • Status tracking (pending, translating, completed, failed)
    • Progress reporting (current element / total elements)
  • 3.3 Add translation schemas (backend/app/schemas/translation.py)

    • TranslationRequest (task_id, target_lang)
    • TranslationStatusResponse (status, progress, error)
    • TranslationListResponse (translations, statistics)
  • 3.4 Register router in main app

4. Frontend - UI Updates

  • 4.1 Enable translation UI in TaskDetailPage

    • Translation state management
    • Language selector connected to state
  • 4.2 Add translation progress display

    • Progress tracking
    • Status polling (translating element X/Y)
    • Error handling and display
  • 4.3 Update API service

    • Implement startTranslation method
    • Add polling for translation status
    • Handle translation result
  • 4.4 Add translation complete state

    • Show success message
    • Display available translated versions

5. Testing

Use existing JSON files in backend/storage/results/ for testing.

Available test samples:

  • Direct track: 1c94bfbf-*/edit_result.json, 8eedd9ed-*/ppt_result.json

  • OCR track: c85fff69-*/scan_result.json, ca2b59a3-*/img3_result.json

  • Hybrid track: 1484ba43-*/edit2_result.json

  • 5.1 Unit tests for DIFY client

    • Test with real API calls (no mocks)
    • Test retry logic on timeout
  • 5.2 Unit tests for translation service

    • Element extraction from existing result.json files (10 tests pass)
    • Result parsing and element_id mapping
    • Table cell extraction and translation
  • 5.3 Integration tests for API endpoints

    • Start translation with existing task_id
    • Status polling during translation
    • Result retrieval after completion
  • 5.4 Manual E2E verification

    • Translate Direct track document (edit_result.json → zh-TW) ✓
    • Verified translation quality and JSON structure

6. Configuration

  • 6.1 Add DIFY configuration (hardcoded in dify_client.py)
    • DIFY_BASE_URL: https://dify.theaken.com/v1
    • DIFY_API_KEY: app-YOPrF2ro5fshzMkCZviIuUJd
    • DIFY_TIMEOUT: 120 seconds
    • DIFY_MAX_RETRIES: 3
    • MAX_BATCH_CHARS: 5000
    • MAX_BATCH_ITEMS: 20

7. Documentation

  • 7.1 Update API documentation

    • Add translation endpoints to OpenAPI spec
  • 7.2 Add DIFY setup instructions

    • API key configuration
    • Rate limiting considerations