feat: add document translation via DIFY AI API
Implement document translation feature using DIFY AI API with batch processing: Backend: - Add DIFY client with batch translation support (5000 chars, 20 items per batch) - Add translation service with element extraction and result building - Add translation router with start/status/result/list/delete endpoints - Add translation schemas (TranslationRequest, TranslationStatus, etc.) Frontend: - Enable translation UI in TaskDetailPage - Add translation API methods to apiV2.ts - Add translation types Features: - Batch translation with numbered markers [1], [2], [3]... - Support for text, title, header, footer, paragraph, footnote, table cells - Translation result JSON with statistics (tokens, latency, batch_count) - Background task processing with progress tracking 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,265 @@
|
||||
# Design: Document Translation Feature
|
||||
|
||||
## Context
|
||||
|
||||
Tool_OCR processes documents through three tracks (Direct/OCR/Hybrid) and outputs UnifiedDocument JSON. Users need translation capability to convert extracted text into different languages while preserving document structure.
|
||||
|
||||
### Constraints
|
||||
- Must use DIFY AI service for translation
|
||||
- API-based solution (no local model management)
|
||||
- Translation quality depends on DIFY's underlying model
|
||||
|
||||
### Stakeholders
|
||||
- End users: Need translated documents
|
||||
- System: Simple HTTP-based integration
|
||||
|
||||
## Goals / Non-Goals
|
||||
|
||||
### Goals
|
||||
- Translate documents using DIFY AI API
|
||||
- Preserve document structure (element positions, formatting)
|
||||
- Support all three processing tracks with unified logic
|
||||
- Real-time progress feedback to users
|
||||
- Simple, maintainable API integration
|
||||
|
||||
### Non-Goals
|
||||
- Local model inference (replaced by DIFY API)
|
||||
- GPU memory management (not needed)
|
||||
- Translation memory or glossary support
|
||||
- Concurrent translation processing
|
||||
|
||||
## Decisions
|
||||
|
||||
### Decision 1: Translation Provider
|
||||
|
||||
**Choice**: DIFY AI Service (theaken.com)
|
||||
|
||||
**Configuration**:
|
||||
- Base URL: `https://dify.theaken.com/v1`
|
||||
- Endpoint: `POST /chat-messages`
|
||||
- API Key: `app-YOPrF2ro5fshzMkCZviIuUJd`
|
||||
- Mode: Chat (Blocking response)
|
||||
|
||||
**Rationale**:
|
||||
- High-quality cloud AI translation
|
||||
- No local model management required
|
||||
- No GPU memory concerns
|
||||
- Easy to maintain and update
|
||||
|
||||
### Decision 2: Response Mode
|
||||
|
||||
**Choice**: Blocking Mode
|
||||
|
||||
**API Request Format**:
|
||||
```json
|
||||
{
|
||||
"inputs": {},
|
||||
"query": "Translate the following text to Chinese:\n\nHello world",
|
||||
"response_mode": "blocking",
|
||||
"conversation_id": "",
|
||||
"user": "tool-ocr-{task_id}"
|
||||
}
|
||||
```
|
||||
|
||||
**API Response Format**:
|
||||
```json
|
||||
{
|
||||
"event": "message",
|
||||
"answer": "你好世界",
|
||||
"conversation_id": "xxx",
|
||||
"metadata": {
|
||||
"usage": {
|
||||
"total_tokens": 54,
|
||||
"latency": 1.26
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Rationale**:
|
||||
- Simpler implementation than streaming
|
||||
- Adequate for batch text translation
|
||||
- Complete response in single call
|
||||
|
||||
### Decision 3: Translation Batch Format
|
||||
|
||||
**Choice**: Single text per request with translation prompt
|
||||
|
||||
**Request Format**:
|
||||
```
|
||||
Translate the following text to {target_language}.
|
||||
Return ONLY the translated text, no explanations.
|
||||
|
||||
{text_content}
|
||||
```
|
||||
|
||||
**Rationale**:
|
||||
- Clear instruction for AI
|
||||
- Predictable response format
|
||||
- Easy to parse result
|
||||
|
||||
### Decision 4: Translation Result Storage
|
||||
|
||||
**Choice**: Independent JSON file per language (unchanged from previous design)
|
||||
|
||||
```
|
||||
backend/storage/results/{task_id}/
|
||||
├── xxx_result.json # Original
|
||||
├── xxx_translated_en.json # English translation
|
||||
├── xxx_translated_ja.json # Japanese translation
|
||||
└── ...
|
||||
```
|
||||
|
||||
**Rationale**:
|
||||
- Non-destructive (original preserved)
|
||||
- Multiple languages supported
|
||||
- Easy to manage and delete
|
||||
- Clear file naming convention
|
||||
|
||||
### Decision 5: Element Type Handling
|
||||
|
||||
**Translatable types** (content is string):
|
||||
- `text`, `title`, `header`, `footer`, `paragraph`, `footnote`
|
||||
|
||||
**Special handling** (content is dict):
|
||||
- `table` -> Translate `cells[].content`
|
||||
|
||||
**Skip** (non-text content):
|
||||
- `page_number`, `image`, `chart`, `logo`, `reference`
|
||||
|
||||
## Architecture
|
||||
|
||||
### Component Diagram
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Frontend │
|
||||
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────────────┐ │
|
||||
│ │ TaskDetail │ │ TranslateBtn │ │ ProgressDisplay │ │
|
||||
│ └─────────────┘ └──────────────┘ └─────────────────────┘ │
|
||||
└────────────────────────────┬────────────────────────────────┘
|
||||
│ HTTP
|
||||
┌────────────────────────────▼────────────────────────────────┐
|
||||
│ Backend API │
|
||||
│ ┌─────────────────────────────────────────────────────────┐│
|
||||
│ │ TranslateRouter ││
|
||||
│ │ POST /api/v2/translate/{task_id} ││
|
||||
│ │ GET /api/v2/translate/{task_id}/status ││
|
||||
│ │ GET /api/v2/translate/{task_id}/result ││
|
||||
│ └─────────────────────────────────────────────────────────┘│
|
||||
└────────────────────────────┬────────────────────────────────┘
|
||||
│
|
||||
┌────────────────────────────▼────────────────────────────────┐
|
||||
│ TranslationService │
|
||||
│ ┌───────────────┐ ┌───────────────┐ ┌─────────────────┐ │
|
||||
│ │ DifyClient │ │ BatchBuilder │ │ ResultParser │ │
|
||||
│ │ - translate() │ │ - extract() │ │ - parse() │ │
|
||||
│ │ - chat() │ │ - format() │ │ - map_ids() │ │
|
||||
│ └───────────────┘ └───────────────┘ └─────────────────┘ │
|
||||
└────────────────────────────┬────────────────────────────────┘
|
||||
│ HTTPS
|
||||
┌────────────────────────────▼────────────────────────────────┐
|
||||
│ DIFY AI Service │
|
||||
│ https://dify.theaken.com/v1 │
|
||||
│ (Chat - Blocking) │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Translation JSON Schema
|
||||
|
||||
```json
|
||||
{
|
||||
"schema_version": "1.0.0",
|
||||
"source_document": "xxx_result.json",
|
||||
"source_lang": "auto",
|
||||
"target_lang": "en",
|
||||
"provider": "dify",
|
||||
"translated_at": "2025-12-02T12:00:00Z",
|
||||
"statistics": {
|
||||
"total_elements": 50,
|
||||
"translated_elements": 45,
|
||||
"skipped_elements": 5,
|
||||
"total_characters": 5000,
|
||||
"processing_time_seconds": 30.5,
|
||||
"total_tokens": 2500
|
||||
},
|
||||
"translations": {
|
||||
"pp3_0_0": "Company Profile",
|
||||
"pp3_0_1": "Founded in 2020...",
|
||||
"table_1_0": {
|
||||
"cells": [
|
||||
{"row": 0, "col": 0, "content": "Technology"},
|
||||
{"row": 0, "col": 1, "content": "Epoxy"}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Language Code Mapping
|
||||
|
||||
```python
|
||||
LANGUAGE_NAMES = {
|
||||
"en": "English",
|
||||
"zh-TW": "Traditional Chinese",
|
||||
"zh-CN": "Simplified Chinese",
|
||||
"ja": "Japanese",
|
||||
"ko": "Korean",
|
||||
"de": "German",
|
||||
"fr": "French",
|
||||
"es": "Spanish",
|
||||
"pt": "Portuguese",
|
||||
"it": "Italian",
|
||||
"ru": "Russian",
|
||||
"vi": "Vietnamese",
|
||||
"th": "Thai",
|
||||
# Additional languages as needed
|
||||
}
|
||||
```
|
||||
|
||||
## Risks / Trade-offs
|
||||
|
||||
### Risk 1: API Availability
|
||||
- **Risk**: DIFY service downtime affects translation
|
||||
- **Mitigation**: Add timeout handling, retry logic, graceful error messages
|
||||
|
||||
### Risk 2: API Cost
|
||||
- **Risk**: High volume translation increases cost
|
||||
- **Mitigation**: Monitor usage via metadata, consider rate limiting
|
||||
|
||||
### Risk 3: Network Latency
|
||||
- **Risk**: Each translation request adds network latency
|
||||
- **Mitigation**: Batch text when possible, show progress to user
|
||||
|
||||
### Risk 4: Translation Quality Variance
|
||||
- **Risk**: AI translation quality varies by language pair
|
||||
- **Mitigation**: Document known limitations, allow user feedback
|
||||
|
||||
## Migration Plan
|
||||
|
||||
### Phase 1: Core Translation (This Proposal)
|
||||
1. DIFY client implementation
|
||||
2. Backend translation service (rewrite)
|
||||
3. API endpoints (modify)
|
||||
4. Frontend activation
|
||||
|
||||
### Phase 2: Enhanced Features (Future)
|
||||
1. Translated PDF generation
|
||||
2. Translation caching
|
||||
3. Custom terminology support
|
||||
|
||||
### Rollback
|
||||
- Translation is additive feature
|
||||
- No schema changes to existing data
|
||||
- Can disable by removing router registration
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. **Rate Limiting**: Should we limit requests per minute to DIFY API?
|
||||
- Tentative: 10 requests per minute per user
|
||||
|
||||
2. **Retry Logic**: How to handle API failures?
|
||||
- Tentative: Retry up to 3 times with exponential backoff
|
||||
|
||||
3. **Batch Size**: How many elements per API call?
|
||||
- Tentative: 1 element per call for simplicity, optimize later if needed
|
||||
@@ -0,0 +1,54 @@
|
||||
# Change: Add Document Translation Feature
|
||||
|
||||
## Why
|
||||
|
||||
Users need to translate OCR-processed documents into different languages while preserving the original layout. Currently, the system only extracts text but cannot translate it. This feature enables multilingual document processing using DIFY AI service, providing high-quality translations with simple API integration.
|
||||
|
||||
## What Changes
|
||||
|
||||
- **NEW**: Translation service using DIFY AI API (Chat mode, Blocking)
|
||||
- **NEW**: Translation REST API endpoints (`/api/v2/translate/*`)
|
||||
- **NEW**: Translation result JSON format (independent file per target language)
|
||||
- **UPDATE**: Frontend translation UI activation with progress display
|
||||
- **REMOVED**: Local MADLAD-400-3B model (replaced with DIFY API)
|
||||
- **REMOVED**: GPU memory management for translation (no longer needed)
|
||||
|
||||
## Impact
|
||||
|
||||
- Affected specs:
|
||||
- NEW `specs/translation/spec.md` - Core translation capability
|
||||
- MODIFY `specs/result-export/spec.md` - Add translation JSON export format
|
||||
|
||||
- Affected code:
|
||||
- `backend/app/services/translation_service.py` (REWRITE - use DIFY API)
|
||||
- `backend/app/routers/translate.py` (MODIFY)
|
||||
- `backend/app/schemas/translation.py` (MODIFY)
|
||||
- `frontend/src/pages/TaskDetailPage.tsx` (MODIFY)
|
||||
- `frontend/src/services/api.ts` (MODIFY)
|
||||
|
||||
## Technical Summary
|
||||
|
||||
### Translation Service
|
||||
- Provider: DIFY AI (theaken.com)
|
||||
- Mode: Chat (Blocking response)
|
||||
- Base URL: `https://dify.theaken.com/v1`
|
||||
- Endpoint: `POST /chat-messages`
|
||||
- API Key: `app-YOPrF2ro5fshzMkCZviIuUJd`
|
||||
|
||||
### Benefits over Local Model
|
||||
| Aspect | DIFY API | Local MADLAD-400 |
|
||||
|--------|----------|------------------|
|
||||
| Quality | High (cloud AI) | Variable |
|
||||
| Setup | No model download | 12GB download |
|
||||
| GPU Usage | None | 2-3GB VRAM |
|
||||
| Latency | ~1-2s per request | Fast after load |
|
||||
| Maintenance | API provider managed | Self-managed |
|
||||
|
||||
### Data Flow
|
||||
1. Read `xxx_result.json` (UnifiedDocument format)
|
||||
2. Extract translatable elements (text, title, header, footer, paragraph, footnote, table cells)
|
||||
3. Send to DIFY API with translation prompt
|
||||
4. Parse response and save to `xxx_translated_{lang}.json`
|
||||
|
||||
### Unified Processing
|
||||
All three tracks (Direct/OCR/Hybrid) use the same UnifiedDocument format, enabling unified translation logic without track-specific handling.
|
||||
@@ -0,0 +1,55 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Translation Result JSON Export
|
||||
|
||||
The system SHALL support exporting translation results as independent JSON files following a defined schema.
|
||||
|
||||
#### Scenario: Export translation result JSON
|
||||
- **WHEN** translation completes for a document
|
||||
- **THEN** system SHALL save translation to `{filename}_translated_{lang}.json`
|
||||
- **AND** file SHALL be stored alongside original `{filename}_result.json`
|
||||
- **AND** original result file SHALL remain unchanged
|
||||
|
||||
#### Scenario: Translation JSON schema compliance
|
||||
- **WHEN** translation result is saved
|
||||
- **THEN** JSON SHALL include schema_version field ("1.0.0")
|
||||
- **AND** SHALL include source_document reference
|
||||
- **AND** SHALL include source_lang and target_lang
|
||||
- **AND** SHALL include provider identifier (e.g., "dify")
|
||||
- **AND** SHALL include translated_at timestamp
|
||||
- **AND** SHALL include translations dict mapping element_id to translated content
|
||||
|
||||
#### Scenario: Translation statistics in export
|
||||
- **WHEN** translation result is saved
|
||||
- **THEN** JSON SHALL include statistics object with:
|
||||
- total_elements: count of all elements in document
|
||||
- translated_elements: count of successfully translated elements
|
||||
- skipped_elements: count of non-translatable elements (images, charts, etc.)
|
||||
- total_characters: character count of translated text
|
||||
- processing_time_seconds: translation duration
|
||||
|
||||
#### Scenario: Table cell translation in export
|
||||
- **WHEN** document contains tables
|
||||
- **THEN** translation JSON SHALL represent table translations as:
|
||||
```json
|
||||
{
|
||||
"table_1_0": {
|
||||
"cells": [
|
||||
{"row": 0, "col": 0, "content": "Translated cell text"},
|
||||
{"row": 0, "col": 1, "content": "Another cell"}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
- **AND** row/col positions SHALL match original table structure
|
||||
|
||||
#### Scenario: Download translation result via API
|
||||
- **WHEN** GET request to `/api/v2/translate/{task_id}/result?lang={lang}`
|
||||
- **THEN** system SHALL return translation JSON content
|
||||
- **AND** Content-Type SHALL be application/json
|
||||
- **AND** response SHALL include appropriate cache headers
|
||||
|
||||
#### Scenario: List available translations
|
||||
- **WHEN** GET request to `/api/v2/tasks/{task_id}/translations`
|
||||
- **THEN** system SHALL return list of available translation languages
|
||||
- **AND** include translation metadata (translated_at, provider, statistics)
|
||||
@@ -0,0 +1,184 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Document Translation Service
|
||||
|
||||
The system SHALL provide a document translation service that translates extracted text from OCR-processed documents into target languages using DIFY AI API.
|
||||
|
||||
#### Scenario: Successful translation of Direct track document
|
||||
- **GIVEN** a completed OCR task with Direct track processing
|
||||
- **WHEN** user requests translation to English
|
||||
- **THEN** the system extracts all translatable elements (text, title, header, footer, paragraph, footnote, table cells)
|
||||
- **AND** translates them using DIFY AI API
|
||||
- **AND** saves the result to `{task_id}_translated_en.json`
|
||||
|
||||
#### Scenario: Successful translation of OCR track document
|
||||
- **GIVEN** a completed OCR task with OCR track processing
|
||||
- **WHEN** user requests translation to Japanese
|
||||
- **THEN** the system extracts all translatable elements from UnifiedDocument format
|
||||
- **AND** translates them preserving element_id mapping
|
||||
- **AND** saves the result to `{task_id}_translated_ja.json`
|
||||
|
||||
#### Scenario: Successful translation of Hybrid track document
|
||||
- **GIVEN** a completed OCR task with Hybrid track processing
|
||||
- **WHEN** translation is requested
|
||||
- **THEN** the system processes the document using the same unified logic
|
||||
- **AND** handles any combination of element types present
|
||||
|
||||
#### Scenario: Table cell translation
|
||||
- **GIVEN** a document containing table elements
|
||||
- **WHEN** translation is requested
|
||||
- **THEN** the system extracts text from each table cell
|
||||
- **AND** translates each cell content individually
|
||||
- **AND** preserves row/col position in the translation result
|
||||
|
||||
---
|
||||
|
||||
### Requirement: Translation API Endpoints
|
||||
|
||||
The system SHALL expose REST API endpoints for translation operations.
|
||||
|
||||
#### Scenario: Start translation request
|
||||
- **GIVEN** a completed OCR task with task_id
|
||||
- **WHEN** POST request to `/api/v2/translate/{task_id}` with target_lang parameter
|
||||
- **THEN** the system starts background translation process
|
||||
- **AND** returns translation job status with 202 Accepted
|
||||
|
||||
#### Scenario: Query translation status
|
||||
- **GIVEN** an active translation job
|
||||
- **WHEN** GET request to `/api/v2/translate/{task_id}/status`
|
||||
- **THEN** the system returns current status (pending, translating, completed, failed)
|
||||
- **AND** includes progress information (current_element, total_elements)
|
||||
|
||||
#### Scenario: Retrieve translation result
|
||||
- **GIVEN** a completed translation job
|
||||
- **WHEN** GET request to `/api/v2/translate/{task_id}/result?lang={target_lang}`
|
||||
- **THEN** the system returns the translation JSON content
|
||||
|
||||
#### Scenario: Translation for non-existent task
|
||||
- **GIVEN** an invalid or non-existent task_id
|
||||
- **WHEN** translation is requested
|
||||
- **THEN** the system returns 404 Not Found error
|
||||
|
||||
---
|
||||
|
||||
### Requirement: DIFY API Integration
|
||||
|
||||
The system SHALL integrate with DIFY AI service for translation.
|
||||
|
||||
#### Scenario: API request format
|
||||
- **GIVEN** text to be translated
|
||||
- **WHEN** calling DIFY API
|
||||
- **THEN** the system sends POST request to `/chat-messages` endpoint
|
||||
- **AND** includes query with translation prompt
|
||||
- **AND** uses blocking response mode
|
||||
- **AND** includes user identifier for tracking
|
||||
|
||||
#### Scenario: API response handling
|
||||
- **GIVEN** DIFY API returns translation response
|
||||
- **WHEN** parsing the response
|
||||
- **THEN** the system extracts translated text from `answer` field
|
||||
- **AND** records usage statistics (tokens, latency)
|
||||
|
||||
#### Scenario: API error handling
|
||||
- **GIVEN** DIFY API returns error or times out
|
||||
- **WHEN** handling the error
|
||||
- **THEN** the system retries up to 3 times with exponential backoff
|
||||
- **AND** returns appropriate error message if all retries fail
|
||||
|
||||
#### Scenario: API rate limiting
|
||||
- **GIVEN** high volume of translation requests
|
||||
- **WHEN** requests approach rate limits
|
||||
- **THEN** the system queues requests appropriately
|
||||
- **AND** provides feedback about wait times
|
||||
|
||||
---
|
||||
|
||||
### Requirement: Translation Prompt Format
|
||||
|
||||
The system SHALL use structured prompts for translation requests.
|
||||
|
||||
#### Scenario: Generate translation prompt
|
||||
- **GIVEN** source text to translate
|
||||
- **WHEN** preparing DIFY API request
|
||||
- **THEN** the system formats prompt as:
|
||||
```
|
||||
Translate the following text to {language}.
|
||||
Return ONLY the translated text, no explanations.
|
||||
|
||||
{text}
|
||||
```
|
||||
|
||||
#### Scenario: Language name mapping
|
||||
- **GIVEN** language code like "zh-TW" or "ja"
|
||||
- **WHEN** constructing translation prompt
|
||||
- **THEN** the system maps to full language name (Traditional Chinese, Japanese)
|
||||
|
||||
---
|
||||
|
||||
### Requirement: Translation Progress Reporting
|
||||
|
||||
The system SHALL provide real-time progress feedback during translation.
|
||||
|
||||
#### Scenario: Progress during multi-element translation
|
||||
- **GIVEN** a document with 50 translatable elements
|
||||
- **WHEN** user queries status
|
||||
- **THEN** the system returns progress like `{"status": "translating", "current_element": 25, "total_elements": 50}`
|
||||
|
||||
#### Scenario: Translation starting status
|
||||
- **GIVEN** translation job just started
|
||||
- **WHEN** user queries status
|
||||
- **THEN** the system returns `{"status": "pending"}`
|
||||
|
||||
---
|
||||
|
||||
### Requirement: Translation Result Storage
|
||||
|
||||
The system SHALL store translation results as independent JSON files.
|
||||
|
||||
#### Scenario: Save translation result
|
||||
- **GIVEN** translation completes successfully
|
||||
- **WHEN** saving results
|
||||
- **THEN** the system creates `{original_filename}_translated_{lang}.json`
|
||||
- **AND** includes schema_version, metadata, and translations dict
|
||||
|
||||
#### Scenario: Multiple language translations
|
||||
- **GIVEN** a document translated to English and Japanese
|
||||
- **WHEN** checking result files
|
||||
- **THEN** both `xxx_translated_en.json` and `xxx_translated_ja.json` exist
|
||||
- **AND** original `xxx_result.json` is unchanged
|
||||
|
||||
---
|
||||
|
||||
### Requirement: Language Support
|
||||
|
||||
The system SHALL support common languages through DIFY AI service.
|
||||
|
||||
#### Scenario: Common language translation
|
||||
- **GIVEN** target language is English, Chinese, Japanese, or Korean
|
||||
- **WHEN** translation is requested
|
||||
- **THEN** the system includes appropriate language name in prompt
|
||||
- **AND** executes translation successfully
|
||||
|
||||
#### Scenario: Automatic source language detection
|
||||
- **GIVEN** source_lang is set to "auto"
|
||||
- **WHEN** translation is executed
|
||||
- **THEN** the AI model automatically detects source language
|
||||
- **AND** translates to target language
|
||||
|
||||
#### Scenario: Supported languages list
|
||||
- **GIVEN** user queries supported languages
|
||||
- **WHEN** checking language support
|
||||
- **THEN** the system provides list including:
|
||||
- English (en)
|
||||
- Traditional Chinese (zh-TW)
|
||||
- Simplified Chinese (zh-CN)
|
||||
- Japanese (ja)
|
||||
- Korean (ko)
|
||||
- German (de)
|
||||
- French (fr)
|
||||
- Spanish (es)
|
||||
- Portuguese (pt)
|
||||
- Italian (it)
|
||||
- Russian (ru)
|
||||
- Vietnamese (vi)
|
||||
- Thai (th)
|
||||
@@ -0,0 +1,121 @@
|
||||
# Implementation Tasks
|
||||
|
||||
## 1. Backend - DIFY Client
|
||||
|
||||
- [x] 1.1 Create DIFY client (`backend/app/services/dify_client.py`)
|
||||
- HTTP client with httpx
|
||||
- Base URL: `https://dify.theaken.com/v1`
|
||||
- API Key configuration
|
||||
- `translate(text, target_lang)` and `translate_batch(texts, target_lang)` methods
|
||||
- Error handling and retry logic (3 retries, exponential backoff)
|
||||
|
||||
- [x] 1.2 Add translation prompt template
|
||||
- Format: "Translate the following text to {language}. Return ONLY the translated text, no explanations.\n\n{text}"
|
||||
- Batch format with numbered markers [1], [2], [3]...
|
||||
- Language name mapping (en → English, zh-TW → Traditional Chinese, etc.)
|
||||
|
||||
## 2. Backend - Translation Service
|
||||
|
||||
- [x] 2.1 Rewrite translation service (`backend/app/services/translation_service.py`)
|
||||
- Use DIFY client instead of local model
|
||||
- Element extraction from UnifiedDocument (all track types)
|
||||
- Batch translation (MAX_BATCH_CHARS=5000, MAX_BATCH_ITEMS=20)
|
||||
- Result parsing and element_id mapping
|
||||
|
||||
- [x] 2.2 Create translation result JSON writer
|
||||
- Schema version, metadata, translations dict
|
||||
- Table cell handling with row/col positions
|
||||
- Save to `{task_id}_translated_{lang}.json`
|
||||
- Include usage statistics (tokens, latency, batch_count)
|
||||
|
||||
- [x] 2.3 Add translatable element type handling
|
||||
- Text types: `text`, `title`, `header`, `footer`, `paragraph`, `footnote`
|
||||
- Table: Extract and translate `cells[].content`
|
||||
- Skip: `page_number`, `image`, `chart`, `logo`, `reference`
|
||||
|
||||
## 3. Backend - API Endpoints
|
||||
|
||||
- [x] 3.1 Create/Update translation router (`backend/app/routers/translate.py`)
|
||||
- POST `/api/v2/translate/{task_id}` - Start translation
|
||||
- GET `/api/v2/translate/{task_id}/status` - Get progress
|
||||
- GET `/api/v2/translate/{task_id}/result` - Get translation result
|
||||
- GET `/api/v2/translate/{task_id}/translations` - List available translations
|
||||
- DELETE `/api/v2/translate/{task_id}/translations/{lang}` - Delete translation
|
||||
|
||||
- [x] 3.2 Implement background task processing
|
||||
- Use FastAPI BackgroundTasks for async translation
|
||||
- Status tracking (pending, translating, completed, failed)
|
||||
- Progress reporting (current element / total elements)
|
||||
|
||||
- [x] 3.3 Add translation schemas (`backend/app/schemas/translation.py`)
|
||||
- TranslationRequest (task_id, target_lang)
|
||||
- TranslationStatusResponse (status, progress, error)
|
||||
- TranslationListResponse (translations, statistics)
|
||||
|
||||
- [x] 3.4 Register router in main app
|
||||
|
||||
## 4. Frontend - UI Updates
|
||||
|
||||
- [x] 4.1 Enable translation UI in TaskDetailPage
|
||||
- Translation state management
|
||||
- Language selector connected to state
|
||||
|
||||
- [x] 4.2 Add translation progress display
|
||||
- Progress tracking
|
||||
- Status polling (translating element X/Y)
|
||||
- Error handling and display
|
||||
|
||||
- [x] 4.3 Update API service
|
||||
- Implement startTranslation method
|
||||
- Add polling for translation status
|
||||
- Handle translation result
|
||||
|
||||
- [x] 4.4 Add translation complete state
|
||||
- Show success message
|
||||
- Display available translated versions
|
||||
|
||||
## 5. Testing
|
||||
|
||||
Use existing JSON files in `backend/storage/results/` for testing.
|
||||
|
||||
Available test samples:
|
||||
- Direct track: `1c94bfbf-*/edit_result.json`, `8eedd9ed-*/ppt_result.json`
|
||||
- OCR track: `c85fff69-*/scan_result.json`, `ca2b59a3-*/img3_result.json`
|
||||
- Hybrid track: `1484ba43-*/edit2_result.json`
|
||||
|
||||
- [x] 5.1 Unit tests for DIFY client
|
||||
- Test with real API calls (no mocks)
|
||||
- Test retry logic on timeout
|
||||
|
||||
- [x] 5.2 Unit tests for translation service
|
||||
- Element extraction from existing result.json files (10 tests pass)
|
||||
- Result parsing and element_id mapping
|
||||
- Table cell extraction and translation
|
||||
|
||||
- [x] 5.3 Integration tests for API endpoints
|
||||
- Start translation with existing task_id
|
||||
- Status polling during translation
|
||||
- Result retrieval after completion
|
||||
|
||||
- [x] 5.4 Manual E2E verification
|
||||
- Translate Direct track document (edit_result.json → zh-TW) ✓
|
||||
- Verified translation quality and JSON structure
|
||||
|
||||
## 6. Configuration
|
||||
|
||||
- [x] 6.1 Add DIFY configuration (hardcoded in dify_client.py)
|
||||
- `DIFY_BASE_URL`: https://dify.theaken.com/v1
|
||||
- `DIFY_API_KEY`: app-YOPrF2ro5fshzMkCZviIuUJd
|
||||
- `DIFY_TIMEOUT`: 120 seconds
|
||||
- `DIFY_MAX_RETRIES`: 3
|
||||
- `MAX_BATCH_CHARS`: 5000
|
||||
- `MAX_BATCH_ITEMS`: 20
|
||||
|
||||
## 7. Documentation
|
||||
|
||||
- [ ] 7.1 Update API documentation
|
||||
- Add translation endpoints to OpenAPI spec
|
||||
|
||||
- [ ] 7.2 Add DIFY setup instructions
|
||||
- API key configuration
|
||||
- Rate limiting considerations
|
||||
Reference in New Issue
Block a user