Backend changes: - Apply background image + invisible text layer to all Direct Track PDFs - Add CHART to regions_to_avoid for text extraction - Improve visual fidelity for native PDFs and Office documents Frontend changes: - Remove JSON, UnifiedDocument, Markdown download buttons - Simplify to 2-column layout with only Layout PDF and Reflow PDF - Remove translation JSON download and Layout PDF option - Keep only Reflow PDF for translated document downloads - Clean up unused imports (FileJson, Database, FileOutput) Archives two OpenSpec proposals: - unify-direct-track-pdf-rendering - simplify-frontend-export-options 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
55 lines
2.4 KiB
Markdown
55 lines
2.4 KiB
Markdown
# Change: Unify Direct Track PDF Rendering with Background Image + Invisible Text Layer
|
|
|
|
## Why
|
|
|
|
Direct Track PDF generation currently has visual rendering issues:
|
|
1. **Chart text overlap**: Text elements extracted from PDF text layer (e.g., "Temperature, °C") overlap with chart images
|
|
2. **Z-order problems**: White text on dark backgrounds becomes invisible when rendered incorrectly
|
|
3. **Office document issues**: PPT/DOC/XLS converted PDFs lose visual fidelity (vector graphics, gradients)
|
|
|
|
The root cause is that Direct Track tries to render individual elements (text, images, tables) separately, which leads to z-order conflicts and missing visual content.
|
|
|
|
## What Changes
|
|
|
|
### Backend Changes
|
|
|
|
1. **Unified Background Image Rendering for All Direct Track**
|
|
- Render source PDF page as full-page background image (2x resolution)
|
|
- Draw invisible text layer on top (PDF Text Rendering Mode 3)
|
|
- Text remains searchable/extractable but doesn't visually overlap
|
|
|
|
2. **Chart Region Exclusion**
|
|
- Add `CHART` element type to `regions_to_avoid`
|
|
- Chart-internal text (axis labels, legends) will NOT be in invisible text layer
|
|
- These texts are already visible in the background image and don't need translation
|
|
|
|
3. **Skip Element Rendering When Background Exists**
|
|
- When background image is rendered, skip individual image/table rendering
|
|
- Only draw invisible text layer for searchability and translation extraction
|
|
|
|
### Frontend Considerations
|
|
|
|
1. **No UI Changes Required for Layout PDF**
|
|
- Layout PDF generation is automatic, no user options needed
|
|
- Visual output will match source PDF exactly
|
|
|
|
2. **Translation Flow Clarification**
|
|
- Layout PDF: Background image + invisible text (for preview)
|
|
- Translated PDF: Reflow layout with real visible text (page-by-page)
|
|
- Chart text excluded from translation (already in background image)
|
|
|
|
## Impact
|
|
|
|
- **Affected specs**: document-processing, result-export, translation
|
|
- **Affected code**:
|
|
- `backend/app/services/pdf_generator_service.py` (main changes)
|
|
- `backend/app/services/direct_extraction_engine.py` (chart detection)
|
|
- **File size**: Output PDF will be larger due to embedded page images (~2MB per page at 2x resolution)
|
|
- **Processing time**: Slight increase for page rendering
|
|
|
|
## Migration
|
|
|
|
- No database changes required
|
|
- No API changes required
|
|
- Existing tasks can be re-exported with new PDF generation logic
|