- Add generate_translated_layout_pdf() method for layout-preserving translated PDFs - Add generate_translated_pdf() method for reflow translated PDFs - Update translate router to accept format parameter (layout/reflow) - Update frontend with dropdown to select translated PDF format - Fix reflow PDF table cell extraction from content dict - Add embedded images handling in reflow PDF tables - Archive improve-translated-text-fitting openspec proposal 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
42 lines
2.0 KiB
Markdown
42 lines
2.0 KiB
Markdown
# Change: Reflow Layout PDF Export for All Tracks
|
||
|
||
## Why
|
||
|
||
When generating translated PDFs, text often doesn't fit within original bounding boxes due to language expansion/contraction differences. Additionally, users may want a readable flowing document format even without translation.
|
||
|
||
**Example from task c79df0ad-f9a6-4c04-8139-13eaef25fa83:**
|
||
- Original Chinese: "华天科技(宝鸡)有限公司设备版块报价单" (19 characters)
|
||
- Translated English: "Huatian Technology (Baoji) Co., Ltd. Equipment Division Quotation" (65+ characters)
|
||
- Same bounding box: 703×111 pixels
|
||
- Current result: Font reduced to minimum (3pt), text unreadable
|
||
|
||
## What Changes
|
||
|
||
- **NEW**: Add reflow layout PDF generation for both OCR and Direct tracks
|
||
- Preserve semantic structure (headings, tables, lists) in reflow mode
|
||
- Use consistent, readable font sizes (12pt body, 16pt headings)
|
||
- Embed images inline within flowing content
|
||
- **IMPORTANT**: Original layout preservation PDF generation remains unchanged
|
||
- Support both tracks with proper reading order:
|
||
- **OCR track**: Use existing `reading_order` array from PP-StructureV3
|
||
- **Direct track**: Use PyMuPDF's implicit order (with option for column detection)
|
||
- **FIX**: Remove outdated MADLAD-400 references from frontend (now uses Dify cloud translation)
|
||
|
||
## Download Options
|
||
|
||
| Scenario | Layout PDF | Reflow PDF |
|
||
|----------|------------|------------|
|
||
| **Without Translation** | Available | Available (NEW) |
|
||
| **With Translation** | - | Available (single option, unchanged) |
|
||
|
||
## Impact
|
||
|
||
- Affected specs: `specs/result-export/spec.md`
|
||
- Affected code:
|
||
- `backend/app/services/pdf_generator_service.py` - add reflow generation method
|
||
- `backend/app/routers/tasks.py` - add reflow PDF download endpoint
|
||
- `backend/app/routers/translate.py` - use reflow mode for translated PDFs
|
||
- `frontend/src/pages/TaskDetailPage.tsx`:
|
||
- Add "Download Reflow PDF" button for original documents
|
||
- Remove MADLAD-400 badge and outdated description text
|