This commit fixes the critical table overlap issue in Direct track PDF layout
restoration where generated tables exceeded their bounding boxes and overlapped
with surrounding text.
Root Cause:
ReportLab's Table component auto-calculates row heights based on content, often
rendering tables larger than their specified bbox. The rowHeights parameter was
ignored during actual rendering, and font size reduction didn't proportionally
affect table height.
Solution - Canvas Transform Scaling:
Implemented a reliable canvas transform approach in _draw_table_element_direct():
1. Wrap table with generous space to get natural rendered dimensions
2. Calculate scale factor: min(bbox_width/actual_width, bbox_height/actual_height, 1.0)
3. Apply canvas transform: saveState → translate → scale → drawOn → restoreState
4. Removed all buffers, using exact bbox positioning
Key Changes:
- backend/app/services/pdf_generator_service.py (_draw_table_element_direct):
* Added canvas scaling logic (lines 2180-2208)
* Removed buffer adjustments (previously 2pt→18pt attempts)
* Use exact bbox position: pdf_y = page_height - bbox.y1
* Supports column widths from metadata to preserve original ratios
- backend/app/services/direct_extraction_engine.py (_process_native_table):
* Extract column widths from PyMuPDF table.cells data (lines 691-761)
* Calculate and store original column width ratios (e.g., 40:60)
* Store in element metadata for use during PDF generation
* Prevents unnecessary text wrapping that increases table height
Results:
Test case showed perfect scaling: natural table 246.8×108.0pt → scaled to
246.8×89.6pt with factor 0.830, fitting exactly within bbox without overlap.
Cleanup:
- Removed test/debug scripts: check_tables.py, verify_chart_recognition.py
- Removed demo files from demo_docs/ (basic/, layout/, mixed/, tables/)
User Confirmed: "FINAL_SCALING_FIX.pdf 此份的結果是可接受的. 恭喜你完成的direct pdf的修復"
Next: Other document formats require layout verification and fixes.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>