feat: add document translation via DIFY AI API
Implement document translation feature using DIFY AI API with batch processing: Backend: - Add DIFY client with batch translation support (5000 chars, 20 items per batch) - Add translation service with element extraction and result building - Add translation router with start/status/result/list/delete endpoints - Add translation schemas (TranslationRequest, TranslationStatus, etc.) Frontend: - Enable translation UI in TaskDetailPage - Add translation API methods to apiV2.ts - Add translation types Features: - Batch translation with numbered markers [1], [2], [3]... - Support for text, title, header, footer, paragraph, footnote, table cells - Translation result JSON with statistics (tokens, latency, batch_count) - Background task processing with progress tracking 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,265 @@
|
||||
# Design: Document Translation Feature
|
||||
|
||||
## Context
|
||||
|
||||
Tool_OCR processes documents through three tracks (Direct/OCR/Hybrid) and outputs UnifiedDocument JSON. Users need translation capability to convert extracted text into different languages while preserving document structure.
|
||||
|
||||
### Constraints
|
||||
- Must use DIFY AI service for translation
|
||||
- API-based solution (no local model management)
|
||||
- Translation quality depends on DIFY's underlying model
|
||||
|
||||
### Stakeholders
|
||||
- End users: Need translated documents
|
||||
- System: Simple HTTP-based integration
|
||||
|
||||
## Goals / Non-Goals
|
||||
|
||||
### Goals
|
||||
- Translate documents using DIFY AI API
|
||||
- Preserve document structure (element positions, formatting)
|
||||
- Support all three processing tracks with unified logic
|
||||
- Real-time progress feedback to users
|
||||
- Simple, maintainable API integration
|
||||
|
||||
### Non-Goals
|
||||
- Local model inference (replaced by DIFY API)
|
||||
- GPU memory management (not needed)
|
||||
- Translation memory or glossary support
|
||||
- Concurrent translation processing
|
||||
|
||||
## Decisions
|
||||
|
||||
### Decision 1: Translation Provider
|
||||
|
||||
**Choice**: DIFY AI Service (theaken.com)
|
||||
|
||||
**Configuration**:
|
||||
- Base URL: `https://dify.theaken.com/v1`
|
||||
- Endpoint: `POST /chat-messages`
|
||||
- API Key: `app-YOPrF2ro5fshzMkCZviIuUJd`
|
||||
- Mode: Chat (Blocking response)
|
||||
|
||||
**Rationale**:
|
||||
- High-quality cloud AI translation
|
||||
- No local model management required
|
||||
- No GPU memory concerns
|
||||
- Easy to maintain and update
|
||||
|
||||
### Decision 2: Response Mode
|
||||
|
||||
**Choice**: Blocking Mode
|
||||
|
||||
**API Request Format**:
|
||||
```json
|
||||
{
|
||||
"inputs": {},
|
||||
"query": "Translate the following text to Chinese:\n\nHello world",
|
||||
"response_mode": "blocking",
|
||||
"conversation_id": "",
|
||||
"user": "tool-ocr-{task_id}"
|
||||
}
|
||||
```
|
||||
|
||||
**API Response Format**:
|
||||
```json
|
||||
{
|
||||
"event": "message",
|
||||
"answer": "你好世界",
|
||||
"conversation_id": "xxx",
|
||||
"metadata": {
|
||||
"usage": {
|
||||
"total_tokens": 54,
|
||||
"latency": 1.26
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Rationale**:
|
||||
- Simpler implementation than streaming
|
||||
- Adequate for batch text translation
|
||||
- Complete response in single call
|
||||
|
||||
### Decision 3: Translation Batch Format
|
||||
|
||||
**Choice**: Single text per request with translation prompt
|
||||
|
||||
**Request Format**:
|
||||
```
|
||||
Translate the following text to {target_language}.
|
||||
Return ONLY the translated text, no explanations.
|
||||
|
||||
{text_content}
|
||||
```
|
||||
|
||||
**Rationale**:
|
||||
- Clear instruction for AI
|
||||
- Predictable response format
|
||||
- Easy to parse result
|
||||
|
||||
### Decision 4: Translation Result Storage
|
||||
|
||||
**Choice**: Independent JSON file per language (unchanged from previous design)
|
||||
|
||||
```
|
||||
backend/storage/results/{task_id}/
|
||||
├── xxx_result.json # Original
|
||||
├── xxx_translated_en.json # English translation
|
||||
├── xxx_translated_ja.json # Japanese translation
|
||||
└── ...
|
||||
```
|
||||
|
||||
**Rationale**:
|
||||
- Non-destructive (original preserved)
|
||||
- Multiple languages supported
|
||||
- Easy to manage and delete
|
||||
- Clear file naming convention
|
||||
|
||||
### Decision 5: Element Type Handling
|
||||
|
||||
**Translatable types** (content is string):
|
||||
- `text`, `title`, `header`, `footer`, `paragraph`, `footnote`
|
||||
|
||||
**Special handling** (content is dict):
|
||||
- `table` -> Translate `cells[].content`
|
||||
|
||||
**Skip** (non-text content):
|
||||
- `page_number`, `image`, `chart`, `logo`, `reference`
|
||||
|
||||
## Architecture
|
||||
|
||||
### Component Diagram
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Frontend │
|
||||
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────────────┐ │
|
||||
│ │ TaskDetail │ │ TranslateBtn │ │ ProgressDisplay │ │
|
||||
│ └─────────────┘ └──────────────┘ └─────────────────────┘ │
|
||||
└────────────────────────────┬────────────────────────────────┘
|
||||
│ HTTP
|
||||
┌────────────────────────────▼────────────────────────────────┐
|
||||
│ Backend API │
|
||||
│ ┌─────────────────────────────────────────────────────────┐│
|
||||
│ │ TranslateRouter ││
|
||||
│ │ POST /api/v2/translate/{task_id} ││
|
||||
│ │ GET /api/v2/translate/{task_id}/status ││
|
||||
│ │ GET /api/v2/translate/{task_id}/result ││
|
||||
│ └─────────────────────────────────────────────────────────┘│
|
||||
└────────────────────────────┬────────────────────────────────┘
|
||||
│
|
||||
┌────────────────────────────▼────────────────────────────────┐
|
||||
│ TranslationService │
|
||||
│ ┌───────────────┐ ┌───────────────┐ ┌─────────────────┐ │
|
||||
│ │ DifyClient │ │ BatchBuilder │ │ ResultParser │ │
|
||||
│ │ - translate() │ │ - extract() │ │ - parse() │ │
|
||||
│ │ - chat() │ │ - format() │ │ - map_ids() │ │
|
||||
│ └───────────────┘ └───────────────┘ └─────────────────┘ │
|
||||
└────────────────────────────┬────────────────────────────────┘
|
||||
│ HTTPS
|
||||
┌────────────────────────────▼────────────────────────────────┐
|
||||
│ DIFY AI Service │
|
||||
│ https://dify.theaken.com/v1 │
|
||||
│ (Chat - Blocking) │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Translation JSON Schema
|
||||
|
||||
```json
|
||||
{
|
||||
"schema_version": "1.0.0",
|
||||
"source_document": "xxx_result.json",
|
||||
"source_lang": "auto",
|
||||
"target_lang": "en",
|
||||
"provider": "dify",
|
||||
"translated_at": "2025-12-02T12:00:00Z",
|
||||
"statistics": {
|
||||
"total_elements": 50,
|
||||
"translated_elements": 45,
|
||||
"skipped_elements": 5,
|
||||
"total_characters": 5000,
|
||||
"processing_time_seconds": 30.5,
|
||||
"total_tokens": 2500
|
||||
},
|
||||
"translations": {
|
||||
"pp3_0_0": "Company Profile",
|
||||
"pp3_0_1": "Founded in 2020...",
|
||||
"table_1_0": {
|
||||
"cells": [
|
||||
{"row": 0, "col": 0, "content": "Technology"},
|
||||
{"row": 0, "col": 1, "content": "Epoxy"}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Language Code Mapping
|
||||
|
||||
```python
|
||||
LANGUAGE_NAMES = {
|
||||
"en": "English",
|
||||
"zh-TW": "Traditional Chinese",
|
||||
"zh-CN": "Simplified Chinese",
|
||||
"ja": "Japanese",
|
||||
"ko": "Korean",
|
||||
"de": "German",
|
||||
"fr": "French",
|
||||
"es": "Spanish",
|
||||
"pt": "Portuguese",
|
||||
"it": "Italian",
|
||||
"ru": "Russian",
|
||||
"vi": "Vietnamese",
|
||||
"th": "Thai",
|
||||
# Additional languages as needed
|
||||
}
|
||||
```
|
||||
|
||||
## Risks / Trade-offs
|
||||
|
||||
### Risk 1: API Availability
|
||||
- **Risk**: DIFY service downtime affects translation
|
||||
- **Mitigation**: Add timeout handling, retry logic, graceful error messages
|
||||
|
||||
### Risk 2: API Cost
|
||||
- **Risk**: High volume translation increases cost
|
||||
- **Mitigation**: Monitor usage via metadata, consider rate limiting
|
||||
|
||||
### Risk 3: Network Latency
|
||||
- **Risk**: Each translation request adds network latency
|
||||
- **Mitigation**: Batch text when possible, show progress to user
|
||||
|
||||
### Risk 4: Translation Quality Variance
|
||||
- **Risk**: AI translation quality varies by language pair
|
||||
- **Mitigation**: Document known limitations, allow user feedback
|
||||
|
||||
## Migration Plan
|
||||
|
||||
### Phase 1: Core Translation (This Proposal)
|
||||
1. DIFY client implementation
|
||||
2. Backend translation service (rewrite)
|
||||
3. API endpoints (modify)
|
||||
4. Frontend activation
|
||||
|
||||
### Phase 2: Enhanced Features (Future)
|
||||
1. Translated PDF generation
|
||||
2. Translation caching
|
||||
3. Custom terminology support
|
||||
|
||||
### Rollback
|
||||
- Translation is additive feature
|
||||
- No schema changes to existing data
|
||||
- Can disable by removing router registration
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. **Rate Limiting**: Should we limit requests per minute to DIFY API?
|
||||
- Tentative: 10 requests per minute per user
|
||||
|
||||
2. **Retry Logic**: How to handle API failures?
|
||||
- Tentative: Retry up to 3 times with exponential backoff
|
||||
|
||||
3. **Batch Size**: How many elements per API call?
|
||||
- Tentative: 1 element per call for simplicity, optimize later if needed
|
||||
Reference in New Issue
Block a user