Implement document translation feature using DIFY AI API with batch processing: Backend: - Add DIFY client with batch translation support (5000 chars, 20 items per batch) - Add translation service with element extraction and result building - Add translation router with start/status/result/list/delete endpoints - Add translation schemas (TranslationRequest, TranslationStatus, etc.) Frontend: - Enable translation UI in TaskDetailPage - Add translation API methods to apiV2.ts - Add translation types Features: - Batch translation with numbered markers [1], [2], [3]... - Support for text, title, header, footer, paragraph, footnote, table cells - Translation result JSON with statistics (tokens, latency, batch_count) - Background task processing with progress tracking 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
266 lines
9.3 KiB
Markdown
266 lines
9.3 KiB
Markdown
# Design: Document Translation Feature
|
|
|
|
## Context
|
|
|
|
Tool_OCR processes documents through three tracks (Direct/OCR/Hybrid) and outputs UnifiedDocument JSON. Users need translation capability to convert extracted text into different languages while preserving document structure.
|
|
|
|
### Constraints
|
|
- Must use DIFY AI service for translation
|
|
- API-based solution (no local model management)
|
|
- Translation quality depends on DIFY's underlying model
|
|
|
|
### Stakeholders
|
|
- End users: Need translated documents
|
|
- System: Simple HTTP-based integration
|
|
|
|
## Goals / Non-Goals
|
|
|
|
### Goals
|
|
- Translate documents using DIFY AI API
|
|
- Preserve document structure (element positions, formatting)
|
|
- Support all three processing tracks with unified logic
|
|
- Real-time progress feedback to users
|
|
- Simple, maintainable API integration
|
|
|
|
### Non-Goals
|
|
- Local model inference (replaced by DIFY API)
|
|
- GPU memory management (not needed)
|
|
- Translation memory or glossary support
|
|
- Concurrent translation processing
|
|
|
|
## Decisions
|
|
|
|
### Decision 1: Translation Provider
|
|
|
|
**Choice**: DIFY AI Service (theaken.com)
|
|
|
|
**Configuration**:
|
|
- Base URL: `https://dify.theaken.com/v1`
|
|
- Endpoint: `POST /chat-messages`
|
|
- API Key: `app-YOPrF2ro5fshzMkCZviIuUJd`
|
|
- Mode: Chat (Blocking response)
|
|
|
|
**Rationale**:
|
|
- High-quality cloud AI translation
|
|
- No local model management required
|
|
- No GPU memory concerns
|
|
- Easy to maintain and update
|
|
|
|
### Decision 2: Response Mode
|
|
|
|
**Choice**: Blocking Mode
|
|
|
|
**API Request Format**:
|
|
```json
|
|
{
|
|
"inputs": {},
|
|
"query": "Translate the following text to Chinese:\n\nHello world",
|
|
"response_mode": "blocking",
|
|
"conversation_id": "",
|
|
"user": "tool-ocr-{task_id}"
|
|
}
|
|
```
|
|
|
|
**API Response Format**:
|
|
```json
|
|
{
|
|
"event": "message",
|
|
"answer": "你好世界",
|
|
"conversation_id": "xxx",
|
|
"metadata": {
|
|
"usage": {
|
|
"total_tokens": 54,
|
|
"latency": 1.26
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
**Rationale**:
|
|
- Simpler implementation than streaming
|
|
- Adequate for batch text translation
|
|
- Complete response in single call
|
|
|
|
### Decision 3: Translation Batch Format
|
|
|
|
**Choice**: Single text per request with translation prompt
|
|
|
|
**Request Format**:
|
|
```
|
|
Translate the following text to {target_language}.
|
|
Return ONLY the translated text, no explanations.
|
|
|
|
{text_content}
|
|
```
|
|
|
|
**Rationale**:
|
|
- Clear instruction for AI
|
|
- Predictable response format
|
|
- Easy to parse result
|
|
|
|
### Decision 4: Translation Result Storage
|
|
|
|
**Choice**: Independent JSON file per language (unchanged from previous design)
|
|
|
|
```
|
|
backend/storage/results/{task_id}/
|
|
├── xxx_result.json # Original
|
|
├── xxx_translated_en.json # English translation
|
|
├── xxx_translated_ja.json # Japanese translation
|
|
└── ...
|
|
```
|
|
|
|
**Rationale**:
|
|
- Non-destructive (original preserved)
|
|
- Multiple languages supported
|
|
- Easy to manage and delete
|
|
- Clear file naming convention
|
|
|
|
### Decision 5: Element Type Handling
|
|
|
|
**Translatable types** (content is string):
|
|
- `text`, `title`, `header`, `footer`, `paragraph`, `footnote`
|
|
|
|
**Special handling** (content is dict):
|
|
- `table` -> Translate `cells[].content`
|
|
|
|
**Skip** (non-text content):
|
|
- `page_number`, `image`, `chart`, `logo`, `reference`
|
|
|
|
## Architecture
|
|
|
|
### Component Diagram
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Frontend │
|
|
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────────────┐ │
|
|
│ │ TaskDetail │ │ TranslateBtn │ │ ProgressDisplay │ │
|
|
│ └─────────────┘ └──────────────┘ └─────────────────────┘ │
|
|
└────────────────────────────┬────────────────────────────────┘
|
|
│ HTTP
|
|
┌────────────────────────────▼────────────────────────────────┐
|
|
│ Backend API │
|
|
│ ┌─────────────────────────────────────────────────────────┐│
|
|
│ │ TranslateRouter ││
|
|
│ │ POST /api/v2/translate/{task_id} ││
|
|
│ │ GET /api/v2/translate/{task_id}/status ││
|
|
│ │ GET /api/v2/translate/{task_id}/result ││
|
|
│ └─────────────────────────────────────────────────────────┘│
|
|
└────────────────────────────┬────────────────────────────────┘
|
|
│
|
|
┌────────────────────────────▼────────────────────────────────┐
|
|
│ TranslationService │
|
|
│ ┌───────────────┐ ┌───────────────┐ ┌─────────────────┐ │
|
|
│ │ DifyClient │ │ BatchBuilder │ │ ResultParser │ │
|
|
│ │ - translate() │ │ - extract() │ │ - parse() │ │
|
|
│ │ - chat() │ │ - format() │ │ - map_ids() │ │
|
|
│ └───────────────┘ └───────────────┘ └─────────────────┘ │
|
|
└────────────────────────────┬────────────────────────────────┘
|
|
│ HTTPS
|
|
┌────────────────────────────▼────────────────────────────────┐
|
|
│ DIFY AI Service │
|
|
│ https://dify.theaken.com/v1 │
|
|
│ (Chat - Blocking) │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Translation JSON Schema
|
|
|
|
```json
|
|
{
|
|
"schema_version": "1.0.0",
|
|
"source_document": "xxx_result.json",
|
|
"source_lang": "auto",
|
|
"target_lang": "en",
|
|
"provider": "dify",
|
|
"translated_at": "2025-12-02T12:00:00Z",
|
|
"statistics": {
|
|
"total_elements": 50,
|
|
"translated_elements": 45,
|
|
"skipped_elements": 5,
|
|
"total_characters": 5000,
|
|
"processing_time_seconds": 30.5,
|
|
"total_tokens": 2500
|
|
},
|
|
"translations": {
|
|
"pp3_0_0": "Company Profile",
|
|
"pp3_0_1": "Founded in 2020...",
|
|
"table_1_0": {
|
|
"cells": [
|
|
{"row": 0, "col": 0, "content": "Technology"},
|
|
{"row": 0, "col": 1, "content": "Epoxy"}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### Language Code Mapping
|
|
|
|
```python
|
|
LANGUAGE_NAMES = {
|
|
"en": "English",
|
|
"zh-TW": "Traditional Chinese",
|
|
"zh-CN": "Simplified Chinese",
|
|
"ja": "Japanese",
|
|
"ko": "Korean",
|
|
"de": "German",
|
|
"fr": "French",
|
|
"es": "Spanish",
|
|
"pt": "Portuguese",
|
|
"it": "Italian",
|
|
"ru": "Russian",
|
|
"vi": "Vietnamese",
|
|
"th": "Thai",
|
|
# Additional languages as needed
|
|
}
|
|
```
|
|
|
|
## Risks / Trade-offs
|
|
|
|
### Risk 1: API Availability
|
|
- **Risk**: DIFY service downtime affects translation
|
|
- **Mitigation**: Add timeout handling, retry logic, graceful error messages
|
|
|
|
### Risk 2: API Cost
|
|
- **Risk**: High volume translation increases cost
|
|
- **Mitigation**: Monitor usage via metadata, consider rate limiting
|
|
|
|
### Risk 3: Network Latency
|
|
- **Risk**: Each translation request adds network latency
|
|
- **Mitigation**: Batch text when possible, show progress to user
|
|
|
|
### Risk 4: Translation Quality Variance
|
|
- **Risk**: AI translation quality varies by language pair
|
|
- **Mitigation**: Document known limitations, allow user feedback
|
|
|
|
## Migration Plan
|
|
|
|
### Phase 1: Core Translation (This Proposal)
|
|
1. DIFY client implementation
|
|
2. Backend translation service (rewrite)
|
|
3. API endpoints (modify)
|
|
4. Frontend activation
|
|
|
|
### Phase 2: Enhanced Features (Future)
|
|
1. Translated PDF generation
|
|
2. Translation caching
|
|
3. Custom terminology support
|
|
|
|
### Rollback
|
|
- Translation is additive feature
|
|
- No schema changes to existing data
|
|
- Can disable by removing router registration
|
|
|
|
## Open Questions
|
|
|
|
1. **Rate Limiting**: Should we limit requests per minute to DIFY API?
|
|
- Tentative: 10 requests per minute per user
|
|
|
|
2. **Retry Logic**: How to handle API failures?
|
|
- Tentative: Retry up to 3 times with exponential backoff
|
|
|
|
3. **Batch Size**: How many elements per API call?
|
|
- Tentative: 1 element per call for simplicity, optimize later if needed
|