# Design: Unify Direct Track PDF Rendering ## Context The Tool_OCR system generates "Layout PDF" files that preserve the original document appearance while maintaining extractable text. Currently, Direct Track (editable PDFs and Office documents) uses element-by-element rendering, which causes: - Z-order conflicts (text behind images) - Missing vector graphics (chart bars, gradients) - White text becoming invisible on dark backgrounds ## Goals / Non-Goals ### Goals - Visual fidelity: Layout PDF matches source document exactly - Text extractability: All text remains searchable/selectable for translation - Unified logic: Same rendering approach for all Direct Track documents - Chart handling: Chart-internal text excluded from translation layer ### Non-Goals - Editable text in Layout PDF (translation creates separate reflow PDF) - Reducing file size (trade-off for visual fidelity) - OCR Track changes (only affects Direct Track) ## Decisions ### Decision 1: Use Background Image + Invisible Text Layer **What**: Render each source PDF page as a full-page background image, then overlay invisible text. **Why**: - Preserves ALL visual content (vector graphics, gradients, complex layouts) - Invisible text (PDF Rendering Mode 3) allows text selection without visual overlap - Simplifies z-order handling (just one image layer + one text layer) **Implementation**: ```python # Render source page as background mat = fitz.Matrix(2.0, 2.0) # 2x resolution pix = source_page.get_pixmap(matrix=mat, alpha=False) pdf_canvas.drawImage(bg_img, 0, 0, width=page_width, height=page_height) # Set invisible text mode pdf_canvas._code.append('3 Tr') # Text render mode: invisible # Draw text elements (invisible but selectable) for elem in text_elements: if not is_inside_chart_region(elem): draw_text_element(elem) pdf_canvas._code.append('0 Tr') # Reset to normal ``` ### Decision 2: Add CHART to regions_to_avoid **What**: Chart-internal text elements are excluded from the invisible text layer. **Why**: - Chart axis labels, legends already visible in background image - These texts typically don't need translation - Prevents duplicate text extraction for translation **Implementation**: ```python # In element classification loop if element.type == ElementType.CHART: image_elements.append(element) regions_to_avoid.append(element) # Exclude chart region from text layer ``` ### Decision 3: Apply to ALL Direct Track Documents **What**: Use background image rendering for both Office documents and native PDFs. **Why**: - Consistent handling eliminates edge cases - Chart text overlap affects both document types - Office detection (LibreOffice producer) is unreliable for some PDFs **Detection logic removed**: ```python # OLD: Only for Office documents is_office_document = 'LibreOffice' in producer or filename.endswith('.pptx') # NEW: All Direct Track uses background rendering if self.current_processing_track == ProcessingTrack.DIRECT: render_background_image() ``` ## Architecture ``` ┌─────────────────────────────────────────────────────────────┐ │ PDF Generation Flow │ ├─────────────────────────────────────────────────────────────┤ │ │ │ Source PDF ──► PyMuPDF ──► Page Pixmap (2x) ──► Background │ │ │ │ │ ▼ │ │ Extract Text ──► Filter Chart Regions │ │ │ │ │ ▼ │ │ Invisible Text Layer (Mode 3) ──► Overlay │ │ │ │ Result: Background Image + Invisible Searchable Text │ │ │ └─────────────────────────────────────────────────────────────┘ ``` ## Risks / Trade-offs | Risk | Impact | Mitigation | |------|--------|------------| | Larger file size (~2MB/page) | Storage, download time | Accept trade-off for visual fidelity | | Slightly slower generation | User wait time | Acceptable for quality improvement | | Chart text not translatable | Feature limitation | Document as expected behavior | | Source PDF required | Can't regenerate without source | Store source PDF reference in task | ## File Size Estimation | Document | Pages | Current Size | New Size (est.) | |----------|-------|--------------|-----------------| | PPT (25 pages) | 25 | ~1.5 MB | ~43 MB | | PDF (3 pages) | 3 | ~68 KB | ~6 MB | ## Open Questions 1. Should we provide a "lightweight" option that skips background rendering for simple PDFs? - **Decision**: No, keep unified approach for consistency 2. Should chart text be optionally included in translation? - **Decision**: No, chart labels rarely need translation and would require complex masking