2.0 KiB
2.0 KiB
Change: Reflow Layout PDF Export for All Tracks
Why
When generating translated PDFs, text often doesn't fit within original bounding boxes due to language expansion/contraction differences. Additionally, users may want a readable flowing document format even without translation.
Example from task c79df0ad-f9a6-4c04-8139-13eaef25fa83:
- Original Chinese: "华天科技(宝鸡)有限公司设备版块报价单" (19 characters)
- Translated English: "Huatian Technology (Baoji) Co., Ltd. Equipment Division Quotation" (65+ characters)
- Same bounding box: 703×111 pixels
- Current result: Font reduced to minimum (3pt), text unreadable
What Changes
- NEW: Add reflow layout PDF generation for both OCR and Direct tracks
- Preserve semantic structure (headings, tables, lists) in reflow mode
- Use consistent, readable font sizes (12pt body, 16pt headings)
- Embed images inline within flowing content
- IMPORTANT: Original layout preservation PDF generation remains unchanged
- Support both tracks with proper reading order:
- OCR track: Use existing
reading_orderarray from PP-StructureV3 - Direct track: Use PyMuPDF's implicit order (with option for column detection)
- OCR track: Use existing
- FIX: Remove outdated MADLAD-400 references from frontend (now uses Dify cloud translation)
Download Options
| Scenario | Layout PDF | Reflow PDF |
|---|---|---|
| Without Translation | Available | Available (NEW) |
| With Translation | - | Available (single option, unchanged) |
Impact
- Affected specs:
specs/result-export/spec.md - Affected code:
backend/app/services/pdf_generator_service.py- add reflow generation methodbackend/app/routers/tasks.py- add reflow PDF download endpointbackend/app/routers/translate.py- use reflow mode for translated PDFsfrontend/src/pages/TaskDetailPage.tsx:- Add "Download Reflow PDF" button for original documents
- Remove MADLAD-400 badge and outdated description text