Major Features: - Add PDF generation service with Chinese font support - Parse HTML tables from PP-StructureV3 and rebuild with ReportLab - Extract table text for translation purposes - Auto-filter text regions inside tables to avoid overlaps Backend Changes: 1. pdf_generator_service.py (NEW) - HTMLTableParser: Parse HTML tables to extract structure - PDFGeneratorService: Generate layout-preserving PDFs - Coordinate transformation: OCR (top-left) → PDF (bottom-left) - Font size heuristics: 75% of bbox height with width checking - Table reconstruction: Parse HTML → ReportLab Table - Image embedding: Extract bbox from filenames 2. ocr_service.py - Add _extract_table_text() for translation support - Add output_dir parameter to save images to result directory - Extract bbox from image filenames (img_in_table_box_x1_y1_x2_y2.jpg) 3. tasks.py - Update process_task_ocr to use save_results() with PDF generation - Fix download_pdf endpoint to use database-stored PDF paths - Support on-demand PDF generation from JSON 4. config.py - Add chinese_font_path configuration - Add pdf_enable_bbox_debug flag Frontend Changes: 1. PDFViewer.tsx (NEW) - React PDF viewer with zoom and pagination - Memoized file config to prevent unnecessary reloads 2. TaskDetailPage.tsx & ResultsPage.tsx - Integrate PDF preview and download 3. main.tsx - Configure PDF.js worker via CDN 4. vite.config.ts - Add host: '0.0.0.0' for network access - Use VITE_API_URL environment variable for backend proxy Dependencies: - reportlab: PDF generation library - Noto Sans SC font: Chinese character support 🤖 Generated with Claude Code https://claude.com/claude-code Co-Authored-By: Claude <noreply@anthropic.com>
32 lines
841 B
Bash
Executable File
32 lines
841 B
Bash
Executable File
#!/bin/bash
|
|
# Download Noto Sans SC TrueType font for layout-preserving PDF generation
|
|
|
|
set -e
|
|
|
|
FONT_DIR="backend/fonts"
|
|
FONT_URL="https://github.com/notofonts/noto-cjk/raw/main/Sans/Variable/TTF/Subset/NotoSansSC-VF.ttf"
|
|
FONT_FILE="NotoSansSC-Regular.ttf"
|
|
|
|
echo "🔤 Downloading Chinese font for PDF generation..."
|
|
|
|
# Create font directory
|
|
mkdir -p "$FONT_DIR"
|
|
|
|
# Download font if not exists
|
|
if [ -f "$FONT_DIR/$FONT_FILE" ]; then
|
|
echo "✓ Font already exists: $FONT_DIR/$FONT_FILE"
|
|
else
|
|
echo "Downloading from GitHub..."
|
|
wget "$FONT_URL" -O "$FONT_DIR/$FONT_FILE"
|
|
|
|
if [ -f "$FONT_DIR/$FONT_FILE" ]; then
|
|
SIZE=$(du -h "$FONT_DIR/$FONT_FILE" | cut -f1)
|
|
echo "✓ Font downloaded successfully: $SIZE"
|
|
else
|
|
echo "✗ Font download failed"
|
|
exit 1
|
|
fi
|
|
fi
|
|
|
|
echo "✅ Font setup complete!"
|