# Result Export - Delta Changes ## ADDED Requirements ### Requirement: Image Extraction and Persistence The OCR system SHALL save extracted images to disk during layout analysis for later use in PDF generation. #### Scenario: Images extracted by PP-StructureV3 are saved to disk - **WHEN** OCR processes a document containing images (charts, tables, figures) - **THEN** system SHALL extract image objects from `markdown_images` dictionary - **AND** system SHALL create `imgs/` subdirectory in result folder - **AND** system SHALL save each image object to disk using PIL Image.save() - **AND** saved file paths SHALL match paths recorded in JSON `images_metadata` - **AND** system SHALL log warnings for failed image saves but continue processing #### Scenario: Multi-page documents with images on different pages - **WHEN** OCR processes multi-page PDF with images on multiple pages - **THEN** system SHALL save images from all pages to same `imgs/` folder - **AND** image filenames SHALL include bbox coordinates for uniqueness - **AND** images SHALL be available for PDF generation after OCR completes ### Requirement: Layout-Preserving PDF Generation The system SHALL generate PDF files that preserve the original document layout using OCR JSON data. #### Scenario: PDF generated from JSON with accurate layout - **WHEN** user requests PDF download for a completed task - **THEN** system SHALL parse OCR JSON result file - **AND** system SHALL extract bounding box coordinates for each text region - **AND** system SHALL determine page dimensions from source file or bbox maximum values - **AND** system SHALL generate PDF with text positioned at precise coordinates - **AND** system SHALL use Chinese-compatible font (e.g., Noto Sans CJK) - **AND** system SHALL embed images from `imgs/` folder using paths in `images_metadata` - **AND** generated PDF SHALL visually resemble original document layout with images #### Scenario: PDF download works correctly - **WHEN** user clicks PDF download button - **THEN** system SHALL return cached PDF if already generated - **OR** system SHALL generate new PDF from JSON on first request - **AND** system SHALL NOT return 403 Forbidden error - **AND** downloaded PDF SHALL contain task OCR results with layout preserved #### Scenario: Multi-page PDF generation - **WHEN** OCR JSON contains results for multiple pages - **THEN** generated PDF SHALL contain same number of pages - **AND** each page SHALL display text regions for that page only - **AND** page dimensions SHALL match original document pages ## MODIFIED Requirements ### Requirement: Export Interface The Export page SHALL support downloading OCR results in multiple formats using V2 task APIs. #### Scenario: PDF caching improves performance - **WHEN** user downloads same PDF multiple times - **THEN** system SHALL serve cached PDF file on subsequent requests - **AND** system SHALL NOT regenerate PDF unless JSON changes - **AND** download response time SHALL be faster than initial generation