Implement document translation feature using DIFY AI API with batch processing: Backend: - Add DIFY client with batch translation support (5000 chars, 20 items per batch) - Add translation service with element extraction and result building - Add translation router with start/status/result/list/delete endpoints - Add translation schemas (TranslationRequest, TranslationStatus, etc.) Frontend: - Enable translation UI in TaskDetailPage - Add translation API methods to apiV2.ts - Add translation types Features: - Batch translation with numbered markers [1], [2], [3]... - Support for text, title, header, footer, paragraph, footnote, table cells - Translation result JSON with statistics (tokens, latency, batch_count) - Background task processing with progress tracking 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
55 lines
2.2 KiB
Markdown
55 lines
2.2 KiB
Markdown
# Change: Add Document Translation Feature
|
|
|
|
## Why
|
|
|
|
Users need to translate OCR-processed documents into different languages while preserving the original layout. Currently, the system only extracts text but cannot translate it. This feature enables multilingual document processing using DIFY AI service, providing high-quality translations with simple API integration.
|
|
|
|
## What Changes
|
|
|
|
- **NEW**: Translation service using DIFY AI API (Chat mode, Blocking)
|
|
- **NEW**: Translation REST API endpoints (`/api/v2/translate/*`)
|
|
- **NEW**: Translation result JSON format (independent file per target language)
|
|
- **UPDATE**: Frontend translation UI activation with progress display
|
|
- **REMOVED**: Local MADLAD-400-3B model (replaced with DIFY API)
|
|
- **REMOVED**: GPU memory management for translation (no longer needed)
|
|
|
|
## Impact
|
|
|
|
- Affected specs:
|
|
- NEW `specs/translation/spec.md` - Core translation capability
|
|
- MODIFY `specs/result-export/spec.md` - Add translation JSON export format
|
|
|
|
- Affected code:
|
|
- `backend/app/services/translation_service.py` (REWRITE - use DIFY API)
|
|
- `backend/app/routers/translate.py` (MODIFY)
|
|
- `backend/app/schemas/translation.py` (MODIFY)
|
|
- `frontend/src/pages/TaskDetailPage.tsx` (MODIFY)
|
|
- `frontend/src/services/api.ts` (MODIFY)
|
|
|
|
## Technical Summary
|
|
|
|
### Translation Service
|
|
- Provider: DIFY AI (theaken.com)
|
|
- Mode: Chat (Blocking response)
|
|
- Base URL: `https://dify.theaken.com/v1`
|
|
- Endpoint: `POST /chat-messages`
|
|
- API Key: `app-YOPrF2ro5fshzMkCZviIuUJd`
|
|
|
|
### Benefits over Local Model
|
|
| Aspect | DIFY API | Local MADLAD-400 |
|
|
|--------|----------|------------------|
|
|
| Quality | High (cloud AI) | Variable |
|
|
| Setup | No model download | 12GB download |
|
|
| GPU Usage | None | 2-3GB VRAM |
|
|
| Latency | ~1-2s per request | Fast after load |
|
|
| Maintenance | API provider managed | Self-managed |
|
|
|
|
### Data Flow
|
|
1. Read `xxx_result.json` (UnifiedDocument format)
|
|
2. Extract translatable elements (text, title, header, footer, paragraph, footnote, table cells)
|
|
3. Send to DIFY API with translation prompt
|
|
4. Parse response and save to `xxx_translated_{lang}.json`
|
|
|
|
### Unified Processing
|
|
All three tracks (Direct/OCR/Hybrid) use the same UnifiedDocument format, enabling unified translation logic without track-specific handling.
|