OCR/openspec/changes/archive/2025-12-02-add-document-translation/proposal.md

# Change: Add Document Translation Feature

## Why

Users need to translate OCR-processed documents into different languages while preserving the original layout. Currently, the system only extracts text but cannot translate it. This feature enables multilingual document processing using DIFY AI service, providing high-quality translations with simple API integration.

## What Changes

- **NEW**: Translation service using DIFY AI API (Chat mode, Blocking)
- **NEW**: Translation REST API endpoints (`/api/v2/translate/*`)
- **NEW**: Translation result JSON format (independent file per target language)
- **UPDATE**: Frontend translation UI activation with progress display
- **REMOVED**: Local MADLAD-400-3B model (replaced with DIFY API)
- **REMOVED**: GPU memory management for translation (no longer needed)

## Impact

- Affected specs:
  - NEW `specs/translation/spec.md` - Core translation capability
  - MODIFY `specs/result-export/spec.md` - Add translation JSON export format

- Affected code:
  - `backend/app/services/translation_service.py` (REWRITE - use DIFY API)
  - `backend/app/routers/translate.py` (MODIFY)
  - `backend/app/schemas/translation.py` (MODIFY)
  - `frontend/src/pages/TaskDetailPage.tsx` (MODIFY)
  - `frontend/src/services/api.ts` (MODIFY)

## Technical Summary

### Translation Service
- Provider: DIFY AI (theaken.com)
- Mode: Chat (Blocking response)
- Base URL: `https://dify.theaken.com/v1`
- Endpoint: `POST /chat-messages`
- API Key: `app-YOPrF2ro5fshzMkCZviIuUJd`

### Benefits over Local Model
| Aspect | DIFY API | Local MADLAD-400 |
|--------|----------|------------------|
| Quality | High (cloud AI) | Variable |
| Setup | No model download | 12GB download |
| GPU Usage | None | 2-3GB VRAM |
| Latency | ~1-2s per request | Fast after load |
| Maintenance | API provider managed | Self-managed |

### Data Flow
1. Read `xxx_result.json` (UnifiedDocument format)
2. Extract translatable elements (text, title, header, footer, paragraph, footnote, table cells)
3. Send to DIFY API with translation prompt
4. Parse response and save to `xxx_translated_{lang}.json`

### Unified Processing
All three tracks (Direct/OCR/Hybrid) use the same UnifiedDocument format, enabling unified translation logic without track-specific handling.