# Change: Fix PDF Layout Restoration Coordinate System and Dimension Calculation ## Why During OCR track validation, the generated PDF (img1_layout.pdf) exhibits significant layout discrepancies compared to the original image (img1.png). Specific issues include: - **Element position misalignment**: Text elements appear at incorrect vertical positions - **Abnormal vertical flipping**: Coordinate transformation errors cause content to be inverted - **Incorrect scaling**: Content is stretched or compressed due to wrong page dimension calculations Code review identified two critical logic defects in `backend/app/services/pdf_generator_service.py`: 1. **Page dimension calculation error**: The system ignores explicit page dimensions from OCR results and instead infers dimensions from bounding box boundaries, causing coordinate transformation errors 2. **Missing multi-page support**: The PDF generator only uses the first page's dimensions globally, unable to handle mixed orientation (portrait/landscape) or different-sized pages These issues violate the requirement "Enhanced PDF Export with Layout Preservation" in the result-export specification, making PDF exports unreliable for production use. ## What Changes ### 1. Fix calculate_page_dimensions Logic - **MODIFIED**: `backend/app/services/pdf_generator_service.py::calculate_page_dimensions()` - Change priority order: Check explicit `dimensions` field first, fallback to bbox calculation only when unavailable - Ensure Y-axis coordinate transformation uses correct page height ### 2. Implement Dynamic Per-Page Sizing - **MODIFIED**: `backend/app/services/pdf_generator_service.py::_generate_direct_track_pdf()` - **MODIFIED**: `backend/app/services/pdf_generator_service.py::_generate_ocr_track_pdf()` - Call `pdf_canvas.setPageSize()` for each page to support varying page dimensions - Pass current page height to coordinate transformation functions ### 3. Update OCR Data Converter - **MODIFIED**: `backend/app/services/ocr_to_unified_converter.py::convert_unified_document_to_ocr_data()` - Add `page_dimensions` mapping to output: `{page_index: {width, height}}` - Ensure OCR track has per-page dimension information ## Impact **Affected specs**: result-export (MODIFIED requirement: "Enhanced PDF Export with Layout Preservation") **Affected code**: - `backend/app/services/pdf_generator_service.py` (core fix) - `backend/app/services/ocr_to_unified_converter.py` (data structure enhancement) **Breaking changes**: None - this is a bug fix that makes existing functionality work correctly **Benefits**: - Accurate layout restoration for single-page documents - Support for mixed-orientation multi-page documents - Correct coordinate transformation without vertical flipping errors - Improved reliability for PDF export feature