# Design: Document Translation Feature ## Context Tool_OCR processes documents through three tracks (Direct/OCR/Hybrid) and outputs UnifiedDocument JSON. Users need translation capability to convert extracted text into different languages while preserving document structure. ### Constraints - Must use DIFY AI service for translation - API-based solution (no local model management) - Translation quality depends on DIFY's underlying model ### Stakeholders - End users: Need translated documents - System: Simple HTTP-based integration ## Goals / Non-Goals ### Goals - Translate documents using DIFY AI API - Preserve document structure (element positions, formatting) - Support all three processing tracks with unified logic - Real-time progress feedback to users - Simple, maintainable API integration ### Non-Goals - Local model inference (replaced by DIFY API) - GPU memory management (not needed) - Translation memory or glossary support - Concurrent translation processing ## Decisions ### Decision 1: Translation Provider **Choice**: DIFY AI Service (theaken.com) **Configuration**: - Base URL: `https://dify.theaken.com/v1` - Endpoint: `POST /chat-messages` - API Key: `app-YOPrF2ro5fshzMkCZviIuUJd` - Mode: Chat (Blocking response) **Rationale**: - High-quality cloud AI translation - No local model management required - No GPU memory concerns - Easy to maintain and update ### Decision 2: Response Mode **Choice**: Blocking Mode **API Request Format**: ```json { "inputs": {}, "query": "Translate the following text to Chinese:\n\nHello world", "response_mode": "blocking", "conversation_id": "", "user": "tool-ocr-{task_id}" } ``` **API Response Format**: ```json { "event": "message", "answer": "你好世界", "conversation_id": "xxx", "metadata": { "usage": { "total_tokens": 54, "latency": 1.26 } } } ``` **Rationale**: - Simpler implementation than streaming - Adequate for batch text translation - Complete response in single call ### Decision 3: Translation Batch Format **Choice**: Single text per request with translation prompt **Request Format**: ``` Translate the following text to {target_language}. Return ONLY the translated text, no explanations. {text_content} ``` **Rationale**: - Clear instruction for AI - Predictable response format - Easy to parse result ### Decision 4: Translation Result Storage **Choice**: Independent JSON file per language (unchanged from previous design) ``` backend/storage/results/{task_id}/ ├── xxx_result.json # Original ├── xxx_translated_en.json # English translation ├── xxx_translated_ja.json # Japanese translation └── ... ``` **Rationale**: - Non-destructive (original preserved) - Multiple languages supported - Easy to manage and delete - Clear file naming convention ### Decision 5: Element Type Handling **Translatable types** (content is string): - `text`, `title`, `header`, `footer`, `paragraph`, `footnote` **Special handling** (content is dict): - `table` -> Translate `cells[].content` **Skip** (non-text content): - `page_number`, `image`, `chart`, `logo`, `reference` ## Architecture ### Component Diagram ``` ┌─────────────────────────────────────────────────────────────┐ │ Frontend │ │ ┌─────────────┐ ┌──────────────┐ ┌─────────────────────┐ │ │ │ TaskDetail │ │ TranslateBtn │ │ ProgressDisplay │ │ │ └─────────────┘ └──────────────┘ └─────────────────────┘ │ └────────────────────────────┬────────────────────────────────┘ │ HTTP ┌────────────────────────────▼────────────────────────────────┐ │ Backend API │ │ ┌─────────────────────────────────────────────────────────┐│ │ │ TranslateRouter ││ │ │ POST /api/v2/translate/{task_id} ││ │ │ GET /api/v2/translate/{task_id}/status ││ │ │ GET /api/v2/translate/{task_id}/result ││ │ └─────────────────────────────────────────────────────────┘│ └────────────────────────────┬────────────────────────────────┘ │ ┌────────────────────────────▼────────────────────────────────┐ │ TranslationService │ │ ┌───────────────┐ ┌───────────────┐ ┌─────────────────┐ │ │ │ DifyClient │ │ BatchBuilder │ │ ResultParser │ │ │ │ - translate() │ │ - extract() │ │ - parse() │ │ │ │ - chat() │ │ - format() │ │ - map_ids() │ │ │ └───────────────┘ └───────────────┘ └─────────────────┘ │ └────────────────────────────┬────────────────────────────────┘ │ HTTPS ┌────────────────────────────▼────────────────────────────────┐ │ DIFY AI Service │ │ https://dify.theaken.com/v1 │ │ (Chat - Blocking) │ └─────────────────────────────────────────────────────────────┘ ``` ### Translation JSON Schema ```json { "schema_version": "1.0.0", "source_document": "xxx_result.json", "source_lang": "auto", "target_lang": "en", "provider": "dify", "translated_at": "2025-12-02T12:00:00Z", "statistics": { "total_elements": 50, "translated_elements": 45, "skipped_elements": 5, "total_characters": 5000, "processing_time_seconds": 30.5, "total_tokens": 2500 }, "translations": { "pp3_0_0": "Company Profile", "pp3_0_1": "Founded in 2020...", "table_1_0": { "cells": [ {"row": 0, "col": 0, "content": "Technology"}, {"row": 0, "col": 1, "content": "Epoxy"} ] } } } ``` ### Language Code Mapping ```python LANGUAGE_NAMES = { "en": "English", "zh-TW": "Traditional Chinese", "zh-CN": "Simplified Chinese", "ja": "Japanese", "ko": "Korean", "de": "German", "fr": "French", "es": "Spanish", "pt": "Portuguese", "it": "Italian", "ru": "Russian", "vi": "Vietnamese", "th": "Thai", # Additional languages as needed } ``` ## Risks / Trade-offs ### Risk 1: API Availability - **Risk**: DIFY service downtime affects translation - **Mitigation**: Add timeout handling, retry logic, graceful error messages ### Risk 2: API Cost - **Risk**: High volume translation increases cost - **Mitigation**: Monitor usage via metadata, consider rate limiting ### Risk 3: Network Latency - **Risk**: Each translation request adds network latency - **Mitigation**: Batch text when possible, show progress to user ### Risk 4: Translation Quality Variance - **Risk**: AI translation quality varies by language pair - **Mitigation**: Document known limitations, allow user feedback ## Migration Plan ### Phase 1: Core Translation (This Proposal) 1. DIFY client implementation 2. Backend translation service (rewrite) 3. API endpoints (modify) 4. Frontend activation ### Phase 2: Enhanced Features (Future) 1. Translated PDF generation 2. Translation caching 3. Custom terminology support ### Rollback - Translation is additive feature - No schema changes to existing data - Can disable by removing router registration ## Open Questions 1. **Rate Limiting**: Should we limit requests per minute to DIFY API? - Tentative: 10 requests per minute per user 2. **Retry Logic**: How to handle API failures? - Tentative: Retry up to 3 times with exponential backoff 3. **Batch Size**: How many elements per API call? - Tentative: 1 element per call for simplicity, optimize later if needed