Files
OCR/openspec/changes/archive/2025-12-02-add-document-translation/specs/result-export/spec.md
egg 8d9b69ba93 feat: add document translation via DIFY AI API
Implement document translation feature using DIFY AI API with batch processing:

Backend:
- Add DIFY client with batch translation support (5000 chars, 20 items per batch)
- Add translation service with element extraction and result building
- Add translation router with start/status/result/list/delete endpoints
- Add translation schemas (TranslationRequest, TranslationStatus, etc.)

Frontend:
- Enable translation UI in TaskDetailPage
- Add translation API methods to apiV2.ts
- Add translation types

Features:
- Batch translation with numbered markers [1], [2], [3]...
- Support for text, title, header, footer, paragraph, footnote, table cells
- Translation result JSON with statistics (tokens, latency, batch_count)
- Background task processing with progress tracking

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-02 11:57:02 +08:00

2.3 KiB

ADDED Requirements

Requirement: Translation Result JSON Export

The system SHALL support exporting translation results as independent JSON files following a defined schema.

Scenario: Export translation result JSON

  • WHEN translation completes for a document
  • THEN system SHALL save translation to {filename}_translated_{lang}.json
  • AND file SHALL be stored alongside original {filename}_result.json
  • AND original result file SHALL remain unchanged

Scenario: Translation JSON schema compliance

  • WHEN translation result is saved
  • THEN JSON SHALL include schema_version field ("1.0.0")
  • AND SHALL include source_document reference
  • AND SHALL include source_lang and target_lang
  • AND SHALL include provider identifier (e.g., "dify")
  • AND SHALL include translated_at timestamp
  • AND SHALL include translations dict mapping element_id to translated content

Scenario: Translation statistics in export

  • WHEN translation result is saved
  • THEN JSON SHALL include statistics object with:
    • total_elements: count of all elements in document
    • translated_elements: count of successfully translated elements
    • skipped_elements: count of non-translatable elements (images, charts, etc.)
    • total_characters: character count of translated text
    • processing_time_seconds: translation duration

Scenario: Table cell translation in export

  • WHEN document contains tables
  • THEN translation JSON SHALL represent table translations as:
    {
      "table_1_0": {
        "cells": [
          {"row": 0, "col": 0, "content": "Translated cell text"},
          {"row": 0, "col": 1, "content": "Another cell"}
        ]
      }
    }
    
  • AND row/col positions SHALL match original table structure

Scenario: Download translation result via API

  • WHEN GET request to /api/v2/translate/{task_id}/result?lang={lang}
  • THEN system SHALL return translation JSON content
  • AND Content-Type SHALL be application/json
  • AND response SHALL include appropriate cache headers

Scenario: List available translations

  • WHEN GET request to /api/v2/tasks/{task_id}/translations
  • THEN system SHALL return list of available translation languages
  • AND include translation metadata (translated_at, provider, statistics)