Files
OCR/openspec/changes/archive/2025-12-02-add-document-translation/proposal.md
egg 8d9b69ba93 feat: add document translation via DIFY AI API
Implement document translation feature using DIFY AI API with batch processing:

Backend:
- Add DIFY client with batch translation support (5000 chars, 20 items per batch)
- Add translation service with element extraction and result building
- Add translation router with start/status/result/list/delete endpoints
- Add translation schemas (TranslationRequest, TranslationStatus, etc.)

Frontend:
- Enable translation UI in TaskDetailPage
- Add translation API methods to apiV2.ts
- Add translation types

Features:
- Batch translation with numbered markers [1], [2], [3]...
- Support for text, title, header, footer, paragraph, footnote, table cells
- Translation result JSON with statistics (tokens, latency, batch_count)
- Background task processing with progress tracking

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-02 11:57:02 +08:00

2.2 KiB

Change: Add Document Translation Feature

Why

Users need to translate OCR-processed documents into different languages while preserving the original layout. Currently, the system only extracts text but cannot translate it. This feature enables multilingual document processing using DIFY AI service, providing high-quality translations with simple API integration.

What Changes

  • NEW: Translation service using DIFY AI API (Chat mode, Blocking)
  • NEW: Translation REST API endpoints (/api/v2/translate/*)
  • NEW: Translation result JSON format (independent file per target language)
  • UPDATE: Frontend translation UI activation with progress display
  • REMOVED: Local MADLAD-400-3B model (replaced with DIFY API)
  • REMOVED: GPU memory management for translation (no longer needed)

Impact

  • Affected specs:

    • NEW specs/translation/spec.md - Core translation capability
    • MODIFY specs/result-export/spec.md - Add translation JSON export format
  • Affected code:

    • backend/app/services/translation_service.py (REWRITE - use DIFY API)
    • backend/app/routers/translate.py (MODIFY)
    • backend/app/schemas/translation.py (MODIFY)
    • frontend/src/pages/TaskDetailPage.tsx (MODIFY)
    • frontend/src/services/api.ts (MODIFY)

Technical Summary

Translation Service

  • Provider: DIFY AI (theaken.com)
  • Mode: Chat (Blocking response)
  • Base URL: https://dify.theaken.com/v1
  • Endpoint: POST /chat-messages
  • API Key: app-YOPrF2ro5fshzMkCZviIuUJd

Benefits over Local Model

Aspect DIFY API Local MADLAD-400
Quality High (cloud AI) Variable
Setup No model download 12GB download
GPU Usage None 2-3GB VRAM
Latency ~1-2s per request Fast after load
Maintenance API provider managed Self-managed

Data Flow

  1. Read xxx_result.json (UnifiedDocument format)
  2. Extract translatable elements (text, title, header, footer, paragraph, footnote, table cells)
  3. Send to DIFY API with translation prompt
  4. Parse response and save to xxx_translated_{lang}.json

Unified Processing

All three tracks (Direct/OCR/Hybrid) use the same UnifiedDocument format, enabling unified translation logic without track-specific handling.