egg/OCR

Files

egg 8d9b69ba93 feat: add document translation via DIFY AI API

Implement document translation feature using DIFY AI API with batch processing:

Backend:
- Add DIFY client with batch translation support (5000 chars, 20 items per batch)
- Add translation service with element extraction and result building
- Add translation router with start/status/result/list/delete endpoints
- Add translation schemas (TranslationRequest, TranslationStatus, etc.)

Frontend:
- Enable translation UI in TaskDetailPage
- Add translation API methods to apiV2.ts
- Add translation types

Features:
- Batch translation with numbered markers [1], [2], [3]...
- Support for text, title, header, footer, paragraph, footnote, table cells
- Translation result JSON with statistics (tokens, latency, batch_count)
- Background task processing with progress tracking

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-12-02 11:57:02 +08:00

2.2 KiB

Raw Blame History

Change: Add Document Translation Feature

Why

Users need to translate OCR-processed documents into different languages while preserving the original layout. Currently, the system only extracts text but cannot translate it. This feature enables multilingual document processing using DIFY AI service, providing high-quality translations with simple API integration.

What Changes

NEW: Translation service using DIFY AI API (Chat mode, Blocking)
NEW: Translation REST API endpoints (/api/v2/translate/*)
NEW: Translation result JSON format (independent file per target language)
UPDATE: Frontend translation UI activation with progress display
REMOVED: Local MADLAD-400-3B model (replaced with DIFY API)
REMOVED: GPU memory management for translation (no longer needed)

Impact

Affected specs:
- NEW specs/translation/spec.md - Core translation capability
- MODIFY specs/result-export/spec.md - Add translation JSON export format
Affected code:
- backend/app/services/translation_service.py (REWRITE - use DIFY API)
- backend/app/routers/translate.py (MODIFY)
- backend/app/schemas/translation.py (MODIFY)
- frontend/src/pages/TaskDetailPage.tsx (MODIFY)
- frontend/src/services/api.ts (MODIFY)

Technical Summary

Translation Service

Provider: DIFY AI (theaken.com)
Mode: Chat (Blocking response)
Base URL: https://dify.theaken.com/v1
Endpoint: POST /chat-messages
API Key: app-YOPrF2ro5fshzMkCZviIuUJd

Benefits over Local Model

Aspect	DIFY API	Local MADLAD-400
Quality	High (cloud AI)	Variable
Setup	No model download	12GB download
GPU Usage	None	2-3GB VRAM
Latency	~1-2s per request	Fast after load
Maintenance	API provider managed	Self-managed

Data Flow

Read xxx_result.json (UnifiedDocument format)
Extract translatable elements (text, title, header, footer, paragraph, footnote, table cells)
Send to DIFY API with translation prompt
Parse response and save to xxx_translated_{lang}.json

Unified Processing

All three tracks (Direct/OCR/Hybrid) use the same UnifiedDocument format, enabling unified translation logic without track-specific handling.

2.2 KiB Raw Blame History