# Result Export Specification ## ADDED Requirements ### Requirement: Layout-Preserving PDF Generation The system MUST generate PDF files that preserve the original document layout including images, tables, and text formatting. #### Scenario: Generate PDF with images GIVEN a document processed through OCR or Direct track WHEN images are detected and extracted THEN the generated PDF MUST include all images at their original positions AND images MUST maintain their aspect ratios AND images MUST be saved to an imgs/ subdirectory #### Scenario: Generate PDF with tables GIVEN a document containing tables WHEN tables are detected and extracted THEN the generated PDF MUST render tables with proper structure AND tables MUST use their own bbox coordinates for positioning AND tables MUST NOT depend on fake image references #### Scenario: Generate PDF with styled text GIVEN a document processed through Direct track with StyleInfo WHEN text elements have style information THEN the generated PDF MUST apply font families (with mapping) AND the PDF MUST apply font sizes AND the PDF MUST apply text colors AND the PDF MUST apply bold/italic formatting ### Requirement: Track-Specific Rendering The system MUST provide different rendering approaches based on the processing track. #### Scenario: Direct track rendering GIVEN a document processed through Direct extraction WHEN generating a PDF THEN the system MUST use rich formatting preservation AND maintain precise positioning from the original AND apply all available StyleInfo #### Scenario: OCR track rendering GIVEN a document processed through OCR WHEN generating a PDF THEN the system MUST use simplified rendering AND apply best-effort positioning based on bbox AND use estimated font sizes ### Requirement: Image Path Resolution The system MUST correctly resolve image paths with fallback logic. #### Scenario: Resolve saved image paths GIVEN an element with image content WHEN looking for the image path THEN the system MUST check content["saved_path"] first AND fallback to content["path"] if not found AND fallback to content["image_path"] if not found AND finally check metadata["path"] ## MODIFIED Requirements ### Requirement: PDF Generation Pipeline The PDF generation pipeline MUST be enhanced to support layout preservation. #### Scenario: Enhanced PDF generation GIVEN a UnifiedDocument from either track WHEN generating a PDF THEN the system MUST detect the processing track AND route to the appropriate rendering method AND preserve as much layout information as available ### Requirement: Image Handling in PP-Structure The PP-Structure enhanced module MUST actually save extracted images. #### Scenario: Save PP-Structure images GIVEN PP-Structure extracts an image with img_path WHEN processing the image element THEN the _save_image method MUST save the image to disk AND return a relative path for reference AND handle both file paths and numpy arrays ### Requirement: Table Rendering Logic The table rendering MUST use direct bbox instead of image lookup. #### Scenario: Render table with direct bbox GIVEN a table element with bbox coordinates WHEN rendering the table in PDF THEN the system MUST use the element's own bbox AND NOT look for non-existent table image files AND position the table accurately based on coordinates