Resolve issues where multi-page PDFs with varying page sizes had incorrect
element positioning and scaling. Each page now maintains its own dimensions
and scale factors throughout the generation process.
Key improvements:
Direct Track Processing:
- Store per-page dimensions in page_dimensions mapping (0-based index)
- Set correct page size for each page using setPageSize()
- Pass current page height to all drawing methods for accurate Y-axis conversion
- Each page uses its own dimensions instead of first page dimensions
OCR Track Processing:
- Calculate per-page scale factors with 3-tier priority:
1. Original file dimensions (highest priority)
2. OCR/UnifiedDocument dimensions
3. Fallback to first page dimensions
- Apply correct scaling factors for each page's coordinate transformation
- Handle mixed-size pages correctly (e.g., A4 + A3 in same document)
Dimension Extraction:
- Add get_all_page_sizes() method to extract dimensions for all PDF pages
- Return Dict[int, Tuple[float, float]] mapping page index to (width, height)
- Maintain backward compatibility with get_original_page_size() for first page
- Support both images (single page) and multi-page PDFs
Coordinate System:
- Add ocr_dimensions priority check in calculate_page_dimensions()
- Priority order: ocr_dimensions > dimensions > bbox inference
- Ensure consistent coordinate space across processing tracks
Benefits:
- Correct rendering for documents with mixed page sizes
- Accurate element positioning on all pages
- Proper scaling for scanned documents with varying DPI per page
- Better handling of landscape/portrait mixed documents
Related to archived proposal: fix-pdf-coordinate-system
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>