OCR/spec.md at 9437387ef1eb40319bf32b2dd92dc41d9d8e20df

egg/OCR

Files

egg 08adf3d01d feat: add translated PDF format selection (layout/reflow)

- Add generate_translated_layout_pdf() method for layout-preserving translated PDFs
- Add generate_translated_pdf() method for reflow translated PDFs
- Update translate router to accept format parameter (layout/reflow)
- Update frontend with dropdown to select translated PDF format
- Fix reflow PDF table cell extraction from content dict
- Add embedded images handling in reflow PDF tables
- Archive improve-translated-text-fitting openspec proposal

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-12-03 10:10:28 +08:00

5.8 KiB

Raw Blame History

ADDED Requirements

Requirement: Dual PDF Generation Modes

The system SHALL support two distinct PDF generation modes to serve different use cases for both OCR and Direct tracks.

Scenario: Download layout preservation PDF

WHEN user requests PDF via /api/v2/tasks/{task_id}/download/pdf
THEN PDF SHALL use layout preservation mode
AND text positions SHALL match original document coordinates
AND this option SHALL be available for both OCR and Direct tracks
AND existing behavior SHALL remain unchanged

Scenario: Download reflow layout PDF without translation

WHEN user requests PDF via /api/v2/tasks/{task_id}/download/pdf?format=reflow
THEN PDF SHALL use reflow layout mode
AND text SHALL flow naturally with consistent font sizes
AND body text SHALL use approximately 12pt font size
AND headings SHALL use larger font sizes (14-18pt)
AND this option SHALL be available for both OCR and Direct tracks

Scenario: OCR track reading order in reflow mode

GIVEN document processed via OCR track
WHEN generating reflow PDF
THEN system SHALL use explicit reading_order array from JSON
AND elements SHALL appear in order specified by reading_order indices
AND if reading_order is missing, fall back to spatial sort (y, x)

Scenario: Direct track reading order in reflow mode

GIVEN document processed via Direct track
WHEN generating reflow PDF
THEN system SHALL use implicit element order from extraction
AND elements SHALL appear in list iteration order
AND PyMuPDF's sort=True ordering SHALL be trusted

Requirement: Reflow PDF Semantic Structure

The reflow PDF generation SHALL preserve document semantic structure.

Scenario: Headings in reflow mode

WHEN original document contains headings (title, h1, h2, etc.)
THEN headings SHALL be rendered with larger font sizes
AND headings SHALL be visually distinguished from body text
AND heading hierarchy SHALL be preserved

Scenario: Tables in reflow mode

WHEN original document contains tables
THEN tables SHALL render with visible cell borders
AND column widths SHALL auto-adjust to content
AND table content SHALL be fully visible
AND tables SHALL use appropriate cell padding

Scenario: Images in reflow mode

WHEN original document contains images
THEN images SHALL be embedded inline in flowing content
AND images SHALL be scaled to fit page width if necessary
AND images SHALL maintain aspect ratio

Scenario: Lists in reflow mode

WHEN original document contains numbered or bulleted lists
THEN lists SHALL preserve their formatting
AND list items SHALL flow naturally

MODIFIED Requirements

Requirement: Translated PDF Export API

The system SHALL expose an API endpoint for downloading translated documents as PDF files using reflow layout mode only.

Scenario: Download translated PDF via API

GIVEN a task with completed translation
WHEN POST request to /api/v2/translate/{task_id}/pdf?lang={lang}
THEN system returns PDF file with translated content
AND PDF SHALL use reflow layout mode (not layout preservation)
AND Content-Type is application/pdf
AND Content-Disposition suggests filename like {task_id}_translated_{lang}.pdf

Scenario: Translated PDF uses reflow layout

WHEN user downloads translated PDF
THEN the PDF SHALL use reflow layout mode
AND text SHALL flow naturally with consistent font sizes
AND body text SHALL use approximately 12pt font size
AND headings SHALL use larger font sizes (14-18pt)
AND content SHALL be readable without magnification

Scenario: Translated PDF for OCR track

GIVEN document processed via OCR track with translation
WHEN generating translated PDF
THEN reading order SHALL follow reading_order array
AND translated text SHALL replace original in correct positions

Scenario: Translated PDF for Direct track

GIVEN document processed via Direct track with translation
WHEN generating translated PDF
THEN reading order SHALL follow implicit element order
AND translated text SHALL replace original in correct positions

Scenario: Invalid language parameter

GIVEN a task with translation only to English
WHEN user requests PDF with lang=ja (Japanese)
THEN system returns 404 Not Found
AND response includes available languages in error message

Scenario: Task not found

GIVEN non-existent task_id
WHEN user requests translated PDF
THEN system returns 404 Not Found

Requirement: Frontend Download Options

The frontend SHALL provide appropriate download options based on translation status.

Scenario: Download options without translation

GIVEN a task without any completed translations
WHEN user views TaskDetailPage
THEN page SHALL display "Download Layout PDF" button (original coordinates)
AND page SHALL display "Download Reflow PDF" button (flowing layout)
AND both options SHALL be available in the download section

Scenario: Download options with translation

GIVEN a task with completed translation
WHEN user views TaskDetailPage
THEN page SHALL display "Download Translated PDF" button for each language
AND translated PDF button SHALL remain as single option (no Layout/Reflow choice)
AND translated PDF SHALL automatically use reflow layout

Scenario: Remove outdated MADLAD-400 references

WHEN displaying translation section
THEN page SHALL NOT display "MADLAD-400" badge
AND description text SHALL reflect cloud translation service (Dify)
AND description SHALL NOT mention local model loading time

5.8 KiB Raw Blame History