Critical Fix for Overlapping Content:
After fixing scale factors, overlapping became visible because text was
being drawn on top of tables AND images. Previous code only filtered
text inside tables, not images.
Problem:
1. Text regions overlapped with table regions → duplicated content
2. Text regions overlapped with image regions → text on top of images
3. Old filter only checked tables from images_metadata
4. Old filter used simple point-in-bbox, couldn't handle polygons
Solution:
1. Add _get_bbox_coords() helper:
- Handles both polygon [[x,y],...] and rect [x1,y1,x2,y2] formats
- Returns normalized [x_min, y_min, x_max, y_max]
2. Add _is_bbox_inside() with tolerance:
- Uses _get_bbox_coords() for both inner and outer bbox
- Checks if inner bbox is completely inside outer bbox
- Supports 5px tolerance for edge cases
3. Add _filter_text_in_regions() (replaces old logic):
- Filters text regions against ANY list of regions to avoid
- Works with tables, images, or any other region type
- Logs how many regions were filtered
4. Update generate_layout_pdf():
- Collect both table_regions and image_regions
- Combine into regions_to_avoid list
- Use new filter function instead of old inline logic
Changes:
- backend/app/services/pdf_generator_service.py:
- Add Union to imports
- Add _get_bbox_coords() helper (polygon + rect support)
- Add _is_bbox_inside() (tolerance-based containment check)
- Add _filter_text_in_regions() (generic region filter)
- Replace old table-only filter with new multi-region filter
- Filter text against both tables AND images
Expected Results:
✓ No text drawn inside table regions
✓ No text drawn inside image regions
✓ Tables rendered as proper ReportLab tables
✓ Images rendered as embedded images
✓ No duplicate or overlapping content
Additional:
- Cleaned all Python cache files (__pycache__, *.pyc)
- Cleaned test output directories
- Cleaned uploads and results directories
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>