Files
OCR/openspec/changes/archive/2025-12-02-add-document-translation/design.md
egg 8d9b69ba93 feat: add document translation via DIFY AI API
Implement document translation feature using DIFY AI API with batch processing:

Backend:
- Add DIFY client with batch translation support (5000 chars, 20 items per batch)
- Add translation service with element extraction and result building
- Add translation router with start/status/result/list/delete endpoints
- Add translation schemas (TranslationRequest, TranslationStatus, etc.)

Frontend:
- Enable translation UI in TaskDetailPage
- Add translation API methods to apiV2.ts
- Add translation types

Features:
- Batch translation with numbered markers [1], [2], [3]...
- Support for text, title, header, footer, paragraph, footnote, table cells
- Translation result JSON with statistics (tokens, latency, batch_count)
- Background task processing with progress tracking

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-02 11:57:02 +08:00

266 lines
9.3 KiB
Markdown

# Design: Document Translation Feature
## Context
Tool_OCR processes documents through three tracks (Direct/OCR/Hybrid) and outputs UnifiedDocument JSON. Users need translation capability to convert extracted text into different languages while preserving document structure.
### Constraints
- Must use DIFY AI service for translation
- API-based solution (no local model management)
- Translation quality depends on DIFY's underlying model
### Stakeholders
- End users: Need translated documents
- System: Simple HTTP-based integration
## Goals / Non-Goals
### Goals
- Translate documents using DIFY AI API
- Preserve document structure (element positions, formatting)
- Support all three processing tracks with unified logic
- Real-time progress feedback to users
- Simple, maintainable API integration
### Non-Goals
- Local model inference (replaced by DIFY API)
- GPU memory management (not needed)
- Translation memory or glossary support
- Concurrent translation processing
## Decisions
### Decision 1: Translation Provider
**Choice**: DIFY AI Service (theaken.com)
**Configuration**:
- Base URL: `https://dify.theaken.com/v1`
- Endpoint: `POST /chat-messages`
- API Key: `app-YOPrF2ro5fshzMkCZviIuUJd`
- Mode: Chat (Blocking response)
**Rationale**:
- High-quality cloud AI translation
- No local model management required
- No GPU memory concerns
- Easy to maintain and update
### Decision 2: Response Mode
**Choice**: Blocking Mode
**API Request Format**:
```json
{
"inputs": {},
"query": "Translate the following text to Chinese:\n\nHello world",
"response_mode": "blocking",
"conversation_id": "",
"user": "tool-ocr-{task_id}"
}
```
**API Response Format**:
```json
{
"event": "message",
"answer": "你好世界",
"conversation_id": "xxx",
"metadata": {
"usage": {
"total_tokens": 54,
"latency": 1.26
}
}
}
```
**Rationale**:
- Simpler implementation than streaming
- Adequate for batch text translation
- Complete response in single call
### Decision 3: Translation Batch Format
**Choice**: Single text per request with translation prompt
**Request Format**:
```
Translate the following text to {target_language}.
Return ONLY the translated text, no explanations.
{text_content}
```
**Rationale**:
- Clear instruction for AI
- Predictable response format
- Easy to parse result
### Decision 4: Translation Result Storage
**Choice**: Independent JSON file per language (unchanged from previous design)
```
backend/storage/results/{task_id}/
├── xxx_result.json # Original
├── xxx_translated_en.json # English translation
├── xxx_translated_ja.json # Japanese translation
└── ...
```
**Rationale**:
- Non-destructive (original preserved)
- Multiple languages supported
- Easy to manage and delete
- Clear file naming convention
### Decision 5: Element Type Handling
**Translatable types** (content is string):
- `text`, `title`, `header`, `footer`, `paragraph`, `footnote`
**Special handling** (content is dict):
- `table` -> Translate `cells[].content`
**Skip** (non-text content):
- `page_number`, `image`, `chart`, `logo`, `reference`
## Architecture
### Component Diagram
```
┌─────────────────────────────────────────────────────────────┐
│ Frontend │
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────────────┐ │
│ │ TaskDetail │ │ TranslateBtn │ │ ProgressDisplay │ │
│ └─────────────┘ └──────────────┘ └─────────────────────┘ │
└────────────────────────────┬────────────────────────────────┘
│ HTTP
┌────────────────────────────▼────────────────────────────────┐
│ Backend API │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ TranslateRouter ││
│ │ POST /api/v2/translate/{task_id} ││
│ │ GET /api/v2/translate/{task_id}/status ││
│ │ GET /api/v2/translate/{task_id}/result ││
│ └─────────────────────────────────────────────────────────┘│
└────────────────────────────┬────────────────────────────────┘
┌────────────────────────────▼────────────────────────────────┐
│ TranslationService │
│ ┌───────────────┐ ┌───────────────┐ ┌─────────────────┐ │
│ │ DifyClient │ │ BatchBuilder │ │ ResultParser │ │
│ │ - translate() │ │ - extract() │ │ - parse() │ │
│ │ - chat() │ │ - format() │ │ - map_ids() │ │
│ └───────────────┘ └───────────────┘ └─────────────────┘ │
└────────────────────────────┬────────────────────────────────┘
│ HTTPS
┌────────────────────────────▼────────────────────────────────┐
│ DIFY AI Service │
│ https://dify.theaken.com/v1 │
│ (Chat - Blocking) │
└─────────────────────────────────────────────────────────────┘
```
### Translation JSON Schema
```json
{
"schema_version": "1.0.0",
"source_document": "xxx_result.json",
"source_lang": "auto",
"target_lang": "en",
"provider": "dify",
"translated_at": "2025-12-02T12:00:00Z",
"statistics": {
"total_elements": 50,
"translated_elements": 45,
"skipped_elements": 5,
"total_characters": 5000,
"processing_time_seconds": 30.5,
"total_tokens": 2500
},
"translations": {
"pp3_0_0": "Company Profile",
"pp3_0_1": "Founded in 2020...",
"table_1_0": {
"cells": [
{"row": 0, "col": 0, "content": "Technology"},
{"row": 0, "col": 1, "content": "Epoxy"}
]
}
}
}
```
### Language Code Mapping
```python
LANGUAGE_NAMES = {
"en": "English",
"zh-TW": "Traditional Chinese",
"zh-CN": "Simplified Chinese",
"ja": "Japanese",
"ko": "Korean",
"de": "German",
"fr": "French",
"es": "Spanish",
"pt": "Portuguese",
"it": "Italian",
"ru": "Russian",
"vi": "Vietnamese",
"th": "Thai",
# Additional languages as needed
}
```
## Risks / Trade-offs
### Risk 1: API Availability
- **Risk**: DIFY service downtime affects translation
- **Mitigation**: Add timeout handling, retry logic, graceful error messages
### Risk 2: API Cost
- **Risk**: High volume translation increases cost
- **Mitigation**: Monitor usage via metadata, consider rate limiting
### Risk 3: Network Latency
- **Risk**: Each translation request adds network latency
- **Mitigation**: Batch text when possible, show progress to user
### Risk 4: Translation Quality Variance
- **Risk**: AI translation quality varies by language pair
- **Mitigation**: Document known limitations, allow user feedback
## Migration Plan
### Phase 1: Core Translation (This Proposal)
1. DIFY client implementation
2. Backend translation service (rewrite)
3. API endpoints (modify)
4. Frontend activation
### Phase 2: Enhanced Features (Future)
1. Translated PDF generation
2. Translation caching
3. Custom terminology support
### Rollback
- Translation is additive feature
- No schema changes to existing data
- Can disable by removing router registration
## Open Questions
1. **Rate Limiting**: Should we limit requests per minute to DIFY API?
- Tentative: 10 requests per minute per user
2. **Retry Logic**: How to handle API failures?
- Tentative: Retry up to 3 times with exponential backoff
3. **Batch Size**: How many elements per API call?
- Tentative: 1 element per call for simplicity, optimize later if needed