Files
OCR/openspec/changes/archive/2025-12-11-unify-direct-track-pdf-rendering/design.md
egg 24253ac15e feat: unify Direct Track PDF rendering and simplify export options
Backend changes:
- Apply background image + invisible text layer to all Direct Track PDFs
- Add CHART to regions_to_avoid for text extraction
- Improve visual fidelity for native PDFs and Office documents

Frontend changes:
- Remove JSON, UnifiedDocument, Markdown download buttons
- Simplify to 2-column layout with only Layout PDF and Reflow PDF
- Remove translation JSON download and Layout PDF option
- Keep only Reflow PDF for translated document downloads
- Clean up unused imports (FileJson, Database, FileOutput)

Archives two OpenSpec proposals:
- unify-direct-track-pdf-rendering
- simplify-frontend-export-options

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 07:50:43 +08:00

5.4 KiB

Design: Unify Direct Track PDF Rendering

Context

The Tool_OCR system generates "Layout PDF" files that preserve the original document appearance while maintaining extractable text. Currently, Direct Track (editable PDFs and Office documents) uses element-by-element rendering, which causes:

  • Z-order conflicts (text behind images)
  • Missing vector graphics (chart bars, gradients)
  • White text becoming invisible on dark backgrounds

Goals / Non-Goals

Goals

  • Visual fidelity: Layout PDF matches source document exactly
  • Text extractability: All text remains searchable/selectable for translation
  • Unified logic: Same rendering approach for all Direct Track documents
  • Chart handling: Chart-internal text excluded from translation layer

Non-Goals

  • Editable text in Layout PDF (translation creates separate reflow PDF)
  • Reducing file size (trade-off for visual fidelity)
  • OCR Track changes (only affects Direct Track)

Decisions

Decision 1: Use Background Image + Invisible Text Layer

What: Render each source PDF page as a full-page background image, then overlay invisible text.

Why:

  • Preserves ALL visual content (vector graphics, gradients, complex layouts)
  • Invisible text (PDF Rendering Mode 3) allows text selection without visual overlap
  • Simplifies z-order handling (just one image layer + one text layer)

Implementation:

# Render source page as background
mat = fitz.Matrix(2.0, 2.0)  # 2x resolution
pix = source_page.get_pixmap(matrix=mat, alpha=False)
pdf_canvas.drawImage(bg_img, 0, 0, width=page_width, height=page_height)

# Set invisible text mode
pdf_canvas._code.append('3 Tr')  # Text render mode: invisible

# Draw text elements (invisible but selectable)
for elem in text_elements:
    if not is_inside_chart_region(elem):
        draw_text_element(elem)

pdf_canvas._code.append('0 Tr')  # Reset to normal

Decision 2: Add CHART to regions_to_avoid

What: Chart-internal text elements are excluded from the invisible text layer.

Why:

  • Chart axis labels, legends already visible in background image
  • These texts typically don't need translation
  • Prevents duplicate text extraction for translation

Implementation:

# In element classification loop
if element.type == ElementType.CHART:
    image_elements.append(element)
    regions_to_avoid.append(element)  # Exclude chart region from text layer

Decision 3: Apply to ALL Direct Track Documents

What: Use background image rendering for both Office documents and native PDFs.

Why:

  • Consistent handling eliminates edge cases
  • Chart text overlap affects both document types
  • Office detection (LibreOffice producer) is unreliable for some PDFs

Detection logic removed:

# OLD: Only for Office documents
is_office_document = 'LibreOffice' in producer or filename.endswith('.pptx')

# NEW: All Direct Track uses background rendering
if self.current_processing_track == ProcessingTrack.DIRECT:
    render_background_image()

Architecture

┌─────────────────────────────────────────────────────────────┐
│                    PDF Generation Flow                       │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Source PDF ──► PyMuPDF ──► Page Pixmap (2x) ──► Background │
│                    │                                         │
│                    ▼                                         │
│              Extract Text ──► Filter Chart Regions           │
│                    │                                         │
│                    ▼                                         │
│         Invisible Text Layer (Mode 3) ──► Overlay            │
│                                                              │
│  Result: Background Image + Invisible Searchable Text        │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Risks / Trade-offs

Risk Impact Mitigation
Larger file size (~2MB/page) Storage, download time Accept trade-off for visual fidelity
Slightly slower generation User wait time Acceptable for quality improvement
Chart text not translatable Feature limitation Document as expected behavior
Source PDF required Can't regenerate without source Store source PDF reference in task

File Size Estimation

Document Pages Current Size New Size (est.)
PPT (25 pages) 25 ~1.5 MB ~43 MB
PDF (3 pages) 3 ~68 KB ~6 MB

Open Questions

  1. Should we provide a "lightweight" option that skips background rendering for simple PDFs?

    • Decision: No, keep unified approach for consistency
  2. Should chart text be optionally included in translation?

    • Decision: No, chart labels rarely need translation and would require complex masking