egg/OCR

Files

egg 24253ac15e feat: unify Direct Track PDF rendering and simplify export options

Backend changes:
- Apply background image + invisible text layer to all Direct Track PDFs
- Add CHART to regions_to_avoid for text extraction
- Improve visual fidelity for native PDFs and Office documents

Frontend changes:
- Remove JSON, UnifiedDocument, Markdown download buttons
- Simplify to 2-column layout with only Layout PDF and Reflow PDF
- Remove translation JSON download and Layout PDF option
- Keep only Reflow PDF for translated document downloads
- Clean up unused imports (FileJson, Database, FileOutput)

Archives two OpenSpec proposals:
- unify-direct-track-pdf-rendering
- simplify-frontend-export-options

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2025-12-12 07:50:43 +08:00

10 KiB

Raw Blame History

result-export Specification

Purpose

TBD - created by archiving change fix-v2-api-ui-issues. Update Purpose after archive.

Requirements

Requirement: Export Interface

The Export interface in TaskDetailPage SHALL provide streamlined download options focusing on PDF formats.

Scenario: Download options for completed tasks

WHEN viewing a completed task in TaskDetailPage
THEN the download section SHALL display only two buttons: "版面 PDF" and "流式 PDF"
AND JSON, UnifiedDocument, and Markdown download buttons SHALL NOT be displayed
AND the download grid SHALL use a 2-column layout

Scenario: Translation download options

WHEN viewing completed translations in TaskDetailPage
THEN each translation item SHALL display only a "流式 PDF" download button
AND translation JSON download button SHALL NOT be displayed
AND Layout PDF option for translations SHALL NOT be displayed
AND delete translation button SHALL remain available

Scenario: Backend API remains unchanged

WHEN external clients call download endpoints directly
THEN JSON, Markdown, and UnifiedDocument endpoints SHALL still function
AND translated Layout PDF endpoint SHALL still function
AND no backend changes are required for this frontend simplification

Requirement: Multi-Task Export Selection

The Export page SHALL allow users to select and export multiple tasks.

Scenario: Select multiple tasks for export

WHEN Export page loads
THEN page SHALL display list of user's completed tasks
AND page SHALL provide checkboxes to select multiple tasks
AND page SHALL NOT require batch ID from upload store (legacy V1 behavior)

Scenario: Export selected tasks

WHEN user selects multiple tasks and clicks export
THEN system SHALL download each selected task's results in chosen format
AND downloaded files SHALL be named distinctly (e.g., {task_id}_result.{ext})
AND system MAY provide option to download as ZIP archive for multiple files

Requirement: Export Configuration Persistence

Export settings (format, thresholds, templates) SHALL apply consistently to V2 task downloads.

Scenario: Apply confidence threshold to export

WHEN user sets confidence threshold to 0.7 and exports
THEN downloaded results SHALL only include OCR text with confidence >= 0.7
AND threshold SHALL apply via V2 download endpoint query parameters

Scenario: Apply CSS template to PDF export

WHEN user selects CSS template for PDF format
THEN downloaded PDF SHALL use selected styling
AND template SHALL be passed to V2 /tasks/{id}/download/pdf endpoint

Requirement: Enhanced PDF Export with Layout Preservation

The PDF export SHALL accurately preserve document layout from both OCR and direct extraction tracks with correct coordinate transformation and multi-page support. For Direct Track, a background image rendering approach SHALL be used for visual fidelity.

Scenario: Export PDF from direct extraction track

WHEN exporting PDF from a direct-extraction processed document
THEN the system SHALL render source PDF pages as full-page background images at 2x resolution
AND overlay invisible text elements using PDF Text Rendering Mode 3
AND text SHALL remain selectable and searchable despite being invisible
AND visual output SHALL match source document exactly

Scenario: Export PDF from OCR track with full structure

WHEN exporting PDF from OCR-processed document
THEN the PDF SHALL use all 23 PP-StructureV3 element types
AND render tables with proper cell boundaries
AND maintain reading order from parsing_res_list

Scenario: Handle coordinate transformations correctly

WHEN generating PDF from UnifiedDocument
THEN system SHALL use explicit page dimensions from OCR results (not inferred from bounding boxes)
AND correctly transform Y-axis coordinates from top-left (OCR) to bottom-left (PDF/ReportLab) origin
AND prevent vertical flipping or position misalignment errors

Scenario: Direct Track PDF file size increase

WHEN generating Layout PDF for Direct Track documents
THEN the system SHALL accept increased file size due to embedded page images
AND approximately 1-2 MB per page at 2x resolution is expected
AND this trade-off is accepted for improved visual fidelity

Scenario: Chart elements excluded from text layer

WHEN generating Layout PDF containing charts
THEN the system SHALL NOT include chart-internal text in the invisible text layer
AND chart visuals SHALL be preserved in the background image
AND chart text SHALL NOT be available for text selection or translation

Requirement: Structure Data Export

The system SHALL provide export formats that preserve document structure for downstream processing.

Scenario: Export structured JSON with hierarchy

WHEN user selects structured JSON format
THEN export SHALL include element hierarchy and relationships
AND preserve parent-child relationships (sections, lists)
AND include style and formatting information

Scenario: Export for translation preparation

WHEN user exports with translation_ready=true parameter
THEN export SHALL include translatable text segments
AND maintain coordinate mappings for each segment
AND mark non-translatable regions

Scenario: Export with layout analysis

WHEN user requests layout analysis export
THEN system SHALL include reading order indices
AND identify layout regions (header, body, footer, sidebar)
AND provide confidence scores for layout detection

Requirement: Translation Result JSON Export

The system SHALL support exporting translation results as independent JSON files following a defined schema.

Scenario: Export translation result JSON

WHEN translation completes for a document
THEN system SHALL save translation to {filename}_translated_{lang}.json
AND file SHALL be stored alongside original {filename}_result.json
AND original result file SHALL remain unchanged

Scenario: Translation JSON schema compliance

WHEN translation result is saved
THEN JSON SHALL include schema_version field ("1.0.0")
AND SHALL include source_document reference
AND SHALL include source_lang and target_lang
AND SHALL include provider identifier (e.g., "dify")
AND SHALL include translated_at timestamp
AND SHALL include translations dict mapping element_id to translated content

Scenario: Translation statistics in export

WHEN translation result is saved
THEN JSON SHALL include statistics object with:
- total_elements: count of all elements in document
- translated_elements: count of successfully translated elements
- skipped_elements: count of non-translatable elements (images, charts, etc.)
- total_characters: character count of translated text
- processing_time_seconds: translation duration

Scenario: Table cell translation in export

WHEN document contains tables

THEN translation JSON SHALL represent table translations as:

{
  "table_1_0": {
    "cells": [
      {"row": 0, "col": 0, "content": "Translated cell text"},
      {"row": 0, "col": 1, "content": "Another cell"}
    ]
  }
}

AND row/col positions SHALL match original table structure

Scenario: Download translation result via API

WHEN GET request to /api/v2/translate/{task_id}/result?lang={lang}
THEN system SHALL return translation JSON content
AND Content-Type SHALL be application/json
AND response SHALL include appropriate cache headers

Scenario: List available translations

WHEN GET request to /api/v2/tasks/{task_id}/translations
THEN system SHALL return list of available translation languages
AND include translation metadata (translated_at, provider, statistics)

Requirement: Translated PDF Export API

The system SHALL expose an API endpoint for downloading translated documents as PDF files.

Scenario: Download translated PDF via API

GIVEN a task with completed translation to English
WHEN POST request to /api/v2/translate/{task_id}/pdf?lang=en
THEN system returns PDF file with translated content
AND Content-Type is application/pdf
AND Content-Disposition suggests filename like {task_id}_translated_en.pdf

Scenario: Download translated PDF with layout preservation

WHEN user downloads translated PDF
THEN the PDF maintains original document layout
AND text positions match original document coordinates
AND images and tables appear at original positions

Scenario: Invalid language parameter

GIVEN a task with translation only to English
WHEN user requests PDF with lang=ja (Japanese)
THEN system returns 404 Not Found
AND response includes available languages in error message

Scenario: Task not found

GIVEN non-existent task_id
WHEN user requests translated PDF
THEN system returns 404 Not Found

Requirement: Frontend Translated PDF Download

The frontend SHALL provide UI controls for downloading translated PDFs.

Scenario: Show download button when translation complete

GIVEN a task with translation status "completed"
WHEN user views TaskDetailPage
THEN page displays "Download Translated PDF" button
AND button shows target language (e.g., "Download Translated PDF (English)")

Scenario: Hide download button when no translation

GIVEN a task without any completed translations
WHEN user views TaskDetailPage
THEN "Download Translated PDF" button is not shown

Scenario: Download progress indication

GIVEN user clicks "Download Translated PDF" button
WHEN PDF generation is in progress
THEN button shows loading state
AND prevents double-click
WHEN download completes
THEN browser downloads PDF file
AND button returns to normal state

10 KiB Raw Blame History