## ADDED Requirements ### Requirement: Dual PDF Generation Modes The system SHALL support two distinct PDF generation modes to serve different use cases for both OCR and Direct tracks. #### Scenario: Download layout preservation PDF - **WHEN** user requests PDF via `/api/v2/tasks/{task_id}/download/pdf` - **THEN** PDF SHALL use layout preservation mode - **AND** text positions SHALL match original document coordinates - **AND** this option SHALL be available for both OCR and Direct tracks - **AND** existing behavior SHALL remain unchanged #### Scenario: Download reflow layout PDF without translation - **WHEN** user requests PDF via `/api/v2/tasks/{task_id}/download/pdf?format=reflow` - **THEN** PDF SHALL use reflow layout mode - **AND** text SHALL flow naturally with consistent font sizes - **AND** body text SHALL use approximately 12pt font size - **AND** headings SHALL use larger font sizes (14-18pt) - **AND** this option SHALL be available for both OCR and Direct tracks #### Scenario: OCR track reading order in reflow mode - **GIVEN** document processed via OCR track - **WHEN** generating reflow PDF - **THEN** system SHALL use explicit `reading_order` array from JSON - **AND** elements SHALL appear in order specified by reading_order indices - **AND** if reading_order is missing, fall back to spatial sort (y, x) #### Scenario: Direct track reading order in reflow mode - **GIVEN** document processed via Direct track - **WHEN** generating reflow PDF - **THEN** system SHALL use implicit element order from extraction - **AND** elements SHALL appear in list iteration order - **AND** PyMuPDF's sort=True ordering SHALL be trusted --- ### Requirement: Reflow PDF Semantic Structure The reflow PDF generation SHALL preserve document semantic structure. #### Scenario: Headings in reflow mode - **WHEN** original document contains headings (title, h1, h2, etc.) - **THEN** headings SHALL be rendered with larger font sizes - **AND** headings SHALL be visually distinguished from body text - **AND** heading hierarchy SHALL be preserved #### Scenario: Tables in reflow mode - **WHEN** original document contains tables - **THEN** tables SHALL render with visible cell borders - **AND** column widths SHALL auto-adjust to content - **AND** table content SHALL be fully visible - **AND** tables SHALL use appropriate cell padding #### Scenario: Images in reflow mode - **WHEN** original document contains images - **THEN** images SHALL be embedded inline in flowing content - **AND** images SHALL be scaled to fit page width if necessary - **AND** images SHALL maintain aspect ratio #### Scenario: Lists in reflow mode - **WHEN** original document contains numbered or bulleted lists - **THEN** lists SHALL preserve their formatting - **AND** list items SHALL flow naturally --- ## MODIFIED Requirements ### Requirement: Translated PDF Export API The system SHALL expose an API endpoint for downloading translated documents as PDF files using reflow layout mode only. #### Scenario: Download translated PDF via API - **GIVEN** a task with completed translation - **WHEN** POST request to `/api/v2/translate/{task_id}/pdf?lang={lang}` - **THEN** system returns PDF file with translated content - **AND** PDF SHALL use reflow layout mode (not layout preservation) - **AND** Content-Type is `application/pdf` - **AND** Content-Disposition suggests filename like `{task_id}_translated_{lang}.pdf` #### Scenario: Translated PDF uses reflow layout - **WHEN** user downloads translated PDF - **THEN** the PDF SHALL use reflow layout mode - **AND** text SHALL flow naturally with consistent font sizes - **AND** body text SHALL use approximately 12pt font size - **AND** headings SHALL use larger font sizes (14-18pt) - **AND** content SHALL be readable without magnification #### Scenario: Translated PDF for OCR track - **GIVEN** document processed via OCR track with translation - **WHEN** generating translated PDF - **THEN** reading order SHALL follow `reading_order` array - **AND** translated text SHALL replace original in correct positions #### Scenario: Translated PDF for Direct track - **GIVEN** document processed via Direct track with translation - **WHEN** generating translated PDF - **THEN** reading order SHALL follow implicit element order - **AND** translated text SHALL replace original in correct positions #### Scenario: Invalid language parameter - **GIVEN** a task with translation only to English - **WHEN** user requests PDF with `lang=ja` (Japanese) - **THEN** system returns 404 Not Found - **AND** response includes available languages in error message #### Scenario: Task not found - **GIVEN** non-existent task_id - **WHEN** user requests translated PDF - **THEN** system returns 404 Not Found --- ### Requirement: Frontend Download Options The frontend SHALL provide appropriate download options based on translation status. #### Scenario: Download options without translation - **GIVEN** a task without any completed translations - **WHEN** user views TaskDetailPage - **THEN** page SHALL display "Download Layout PDF" button (original coordinates) - **AND** page SHALL display "Download Reflow PDF" button (flowing layout) - **AND** both options SHALL be available in the download section #### Scenario: Download options with translation - **GIVEN** a task with completed translation - **WHEN** user views TaskDetailPage - **THEN** page SHALL display "Download Translated PDF" button for each language - **AND** translated PDF button SHALL remain as single option (no Layout/Reflow choice) - **AND** translated PDF SHALL automatically use reflow layout #### Scenario: Remove outdated MADLAD-400 references - **WHEN** displaying translation section - **THEN** page SHALL NOT display "MADLAD-400" badge - **AND** description text SHALL reflect cloud translation service (Dify) - **AND** description SHALL NOT mention local model loading time