feat: unify Direct Track PDF rendering and simplify export options

Backend changes: - Apply background image + invisible text layer to all Direct Track PDFs - Add CHART to regions_to_avoid for text extraction - Improve visual fidelity for native PDFs and Office documents Frontend changes: - Remove JSON, UnifiedDocument, Markdown download buttons - Simplify to 2-column layout with only Layout PDF and Reflow PDF - Remove translation JSON download and Layout PDF option - Keep only Reflow PDF for translated document downloads - Clean up unused imports (FileJson, Database, FileOutput) Archives two OpenSpec proposals: - unify-direct-track-pdf-rendering - simplify-frontend-export-options 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 07:50:43 +08:00
parent 53bfa88773
commit 24253ac15e
15 changed files with 891 additions and 195 deletions
--- a/openspec/changes/archive/2025-12-11-unify-direct-track-pdf-rendering/specs/result-export/spec.md
+++ b/openspec/changes/archive/2025-12-11-unify-direct-track-pdf-rendering/specs/result-export/spec.md
@@ -0,0 +1,36 @@
+## MODIFIED Requirements
+
+### Requirement: Enhanced PDF Export with Layout Preservation
+
+The PDF export SHALL accurately preserve document layout from both OCR and direct extraction tracks with correct coordinate transformation and multi-page support. For Direct Track, a background image rendering approach SHALL be used for visual fidelity.
+
+#### Scenario: Export PDF from direct extraction track
+- **WHEN** exporting PDF from a direct-extraction processed document
+- **THEN** the system SHALL render source PDF pages as full-page background images at 2x resolution
+- **AND** overlay invisible text elements using PDF Text Rendering Mode 3
+- **AND** text SHALL remain selectable and searchable despite being invisible
+- **AND** visual output SHALL match source document exactly
+
+#### Scenario: Export PDF from OCR track with full structure
+- **WHEN** exporting PDF from OCR-processed document
+- **THEN** the PDF SHALL use all 23 PP-StructureV3 element types
+- **AND** render tables with proper cell boundaries
+- **AND** maintain reading order from parsing_res_list
+
+#### Scenario: Handle coordinate transformations correctly
+- **WHEN** generating PDF from UnifiedDocument
+- **THEN** system SHALL use explicit page dimensions from OCR results (not inferred from bounding boxes)
+- **AND** correctly transform Y-axis coordinates from top-left (OCR) to bottom-left (PDF/ReportLab) origin
+- **AND** prevent vertical flipping or position misalignment errors
+
+#### Scenario: Direct Track PDF file size increase
+- **WHEN** generating Layout PDF for Direct Track documents
+- **THEN** the system SHALL accept increased file size due to embedded page images
+- **AND** approximately 1-2 MB per page at 2x resolution is expected
+- **AND** this trade-off is accepted for improved visual fidelity
+
+#### Scenario: Chart elements excluded from text layer
+- **WHEN** generating Layout PDF containing charts
+- **THEN** the system SHALL NOT include chart-internal text in the invisible text layer
+- **AND** chart visuals SHALL be preserved in the background image
+- **AND** chart text SHALL NOT be available for text selection or translation