Files
OCR/backend/test_chinese_font.py
egg fa1abcd8e6 feat: implement layout-preserving PDF generation with table reconstruction
Major Features:
- Add PDF generation service with Chinese font support
- Parse HTML tables from PP-StructureV3 and rebuild with ReportLab
- Extract table text for translation purposes
- Auto-filter text regions inside tables to avoid overlaps

Backend Changes:
1. pdf_generator_service.py (NEW)
   - HTMLTableParser: Parse HTML tables to extract structure
   - PDFGeneratorService: Generate layout-preserving PDFs
   - Coordinate transformation: OCR (top-left) → PDF (bottom-left)
   - Font size heuristics: 75% of bbox height with width checking
   - Table reconstruction: Parse HTML → ReportLab Table
   - Image embedding: Extract bbox from filenames

2. ocr_service.py
   - Add _extract_table_text() for translation support
   - Add output_dir parameter to save images to result directory
   - Extract bbox from image filenames (img_in_table_box_x1_y1_x2_y2.jpg)

3. tasks.py
   - Update process_task_ocr to use save_results() with PDF generation
   - Fix download_pdf endpoint to use database-stored PDF paths
   - Support on-demand PDF generation from JSON

4. config.py
   - Add chinese_font_path configuration
   - Add pdf_enable_bbox_debug flag

Frontend Changes:
1. PDFViewer.tsx (NEW)
   - React PDF viewer with zoom and pagination
   - Memoized file config to prevent unnecessary reloads

2. TaskDetailPage.tsx & ResultsPage.tsx
   - Integrate PDF preview and download

3. main.tsx
   - Configure PDF.js worker via CDN

4. vite.config.ts
   - Add host: '0.0.0.0' for network access
   - Use VITE_API_URL environment variable for backend proxy

Dependencies:
- reportlab: PDF generation library
- Noto Sans SC font: Chinese character support

🤖 Generated with Claude Code
https://claude.com/claude-code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 20:21:56 +08:00

63 lines
1.8 KiB
Python

"""
Test script to verify ReportLab and Chinese font rendering
"""
from reportlab.pdfgen import canvas
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont
from pathlib import Path
import sys
def test_chinese_rendering():
"""Test if Chinese characters can be rendered in PDF"""
# Font path
font_path = "/home/egg/project/Tool_OCR/backend/fonts/NotoSansSC-Regular.ttf"
# Check if font file exists
if not Path(font_path).exists():
print(f"❌ Font file not found: {font_path}")
return False
print(f"✓ Font file found: {font_path}")
try:
# Register Chinese font
pdfmetrics.registerFont(TTFont('NotoSansSC', font_path))
print("✓ Font registered successfully")
# Create test PDF
test_pdf = "/tmp/test_chinese.pdf"
c = canvas.Canvas(test_pdf)
# Set Chinese font
c.setFont('NotoSansSC', 14)
# Draw test text
c.drawString(100, 750, "測試中文字符渲染 - Test Chinese Character Rendering")
c.drawString(100, 730, "HTD-S1 技術數據表")
c.drawString(100, 710, "這是一個 PDF 生成測試")
c.save()
print(f"✓ Test PDF created: {test_pdf}")
# Check file size
file_size = Path(test_pdf).stat().st_size
print(f"✓ PDF file size: {file_size} bytes")
if file_size > 0:
print("\n✅ Chinese font rendering test PASSED")
return True
else:
print("\n❌ PDF file is empty")
return False
except Exception as e:
print(f"❌ Error during testing: {e}")
import traceback
traceback.print_exc()
return False
if __name__ == "__main__":
success = test_chinese_rendering()
sys.exit(0 if success else 1)