Problem:
- OCR processes images at smaller resolutions but coordinates were being used directly on larger PDF canvases
- This caused all text/tables/images to be drawn at wrong scale in bottom-left corner
Solution:
- Track OCR image dimensions in JSON output (ocr_dimensions)
- Calculate proper scale factors: scale_w = pdf_width/ocr_width, scale_h = pdf_height/ocr_height
- Apply scaling to all coordinates before drawing on PDF canvas
- Support per-page scaling for multi-page PDFs
Changes:
1. ocr_service.py:
- Add OCR image dimensions capture using PIL
- Include ocr_dimensions in JSON output for both single images and PDFs
2. pdf_generator_service.py:
- Calculate scale factors from OCR dimensions vs target PDF dimensions
- Update all drawing methods (text, table, image) to accept and apply scale factors
- Apply scaling to bbox coordinates before coordinate transformation
3. test_pdf_scaling.py:
- Add test script to verify scaling works correctly
- Test with OCR at 500x700 scaled to PDF at 1000x1400 (2x scaling)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Major Features:
- Add PDF generation service with Chinese font support
- Parse HTML tables from PP-StructureV3 and rebuild with ReportLab
- Extract table text for translation purposes
- Auto-filter text regions inside tables to avoid overlaps
Backend Changes:
1. pdf_generator_service.py (NEW)
- HTMLTableParser: Parse HTML tables to extract structure
- PDFGeneratorService: Generate layout-preserving PDFs
- Coordinate transformation: OCR (top-left) → PDF (bottom-left)
- Font size heuristics: 75% of bbox height with width checking
- Table reconstruction: Parse HTML → ReportLab Table
- Image embedding: Extract bbox from filenames
2. ocr_service.py
- Add _extract_table_text() for translation support
- Add output_dir parameter to save images to result directory
- Extract bbox from image filenames (img_in_table_box_x1_y1_x2_y2.jpg)
3. tasks.py
- Update process_task_ocr to use save_results() with PDF generation
- Fix download_pdf endpoint to use database-stored PDF paths
- Support on-demand PDF generation from JSON
4. config.py
- Add chinese_font_path configuration
- Add pdf_enable_bbox_debug flag
Frontend Changes:
1. PDFViewer.tsx (NEW)
- React PDF viewer with zoom and pagination
- Memoized file config to prevent unnecessary reloads
2. TaskDetailPage.tsx & ResultsPage.tsx
- Integrate PDF preview and download
3. main.tsx
- Configure PDF.js worker via CDN
4. vite.config.ts
- Add host: '0.0.0.0' for network access
- Use VITE_API_URL environment variable for backend proxy
Dependencies:
- reportlab: PDF generation library
- Noto Sans SC font: Chinese character support
🤖 Generated with Claude Code
https://claude.com/claude-code
Co-Authored-By: Claude <noreply@anthropic.com>
Backend fixes:
- Fix markdown generation using correct 'markdown_content' key in tasks.py
- Update admin service to return flat data structure matching frontend types
- Add task_count and failed_tasks fields to user statistics
- Fix top users endpoint to return complete user data
Frontend fixes:
- Migrate ResultsPage from V1 batch API to V2 task API with polling
- Create TaskDetailPage component with markdown preview and download buttons
- Refactor ExportPage to support multi-task selection using V2 download endpoints
- Fix login infinite refresh loop with concurrency control flags
- Create missing Checkbox UI component
New features:
- Add /tasks/:taskId route for task detail view
- Implement multi-task batch export functionality
- Add real-time task status polling (2s interval)
OpenSpec:
- Archive completed proposal 2025-11-17-fix-v2-api-ui-issues
- Create result-export and task-management specifications
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Fixed WSL CUDA library path in ~/.bashrc
- Upgraded PaddlePaddle from 3.0.0 to 3.2.1
- Verified fused_rms_norm_ext API is now available
- Enabled chart recognition in ocr_service.py
- Updated CHART_RECOGNITION.md to reflect enabled status
Chart recognition now supports:
✅ Chart type identification
✅ Data extraction from charts
✅ Axis and legend parsing
✅ Converting charts to structured data
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Remove all V1 architecture components and promote V2 to primary:
- Delete all paddle_ocr_* table models (export, ocr, translation, user)
- Delete legacy routers (auth, export, ocr, translation)
- Delete legacy schemas and services
- Promote user_v2.py to user.py as primary user model
- Update all imports and dependencies to use V2 models only
- Update main.py version to 2.0.0
Database changes:
- Fix SQLAlchemy reserved word: rename audit_log.metadata to extra_data
- Add migration to drop all paddle_ocr_* tables
- Update alembic env to only import V2 models
Frontend fixes:
- Fix Select component exports in TaskHistoryPage.tsx
- Update to use simplified Select API with options prop
- Fix AxiosInstance TypeScript import syntax
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
PaddleOCR-VL chart recognition model requires `fused_rms_norm_ext` API
which is not available in PaddlePaddle 3.0.0 stable release.
Changes:
- Set use_chart_recognition=False in PP-StructureV3 initialization
- Remove unsupported show_log parameter from PaddleOCR 3.x API calls
- Document known limitation in openspec proposal
- Add limitation documentation to README
- Update tasks.md with documentation task for known issues
Impact:
- Layout analysis still detects/extracts charts as images ✓
- Tables, formulas, and text recognition work normally ✓
- Deep chart understanding (type detection, data extraction) disabled ✗
- Chart to structured data conversion disabled ✗
Workaround: Charts saved as image files for manual review
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
PaddlePaddle 3.0.0b2 has "Illegal instruction" error on current CPU.
Downgrade to stable 2.6.2 which works but uses different API.
Changes:
- Auto-detect PaddlePaddle version at runtime
- Use 'device' parameter for 3.x (device="gpu:0" or "cpu")
- Use 'use_gpu' + 'gpu_mem' parameters for 2.x
- Apply to both get_ocr_engine() and get_structure_engine()
- Log PaddlePaddle version in initialization messages
Current setup:
- paddlepaddle-gpu==2.6.2 (stable, CUDA compiled)
- paddleocr==3.3.1
- paddlex==3.3.9
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
PaddleOCR 3.x changed the API:
- Removed: use_gpu=True/False and gpu_mem=<value>
- Added: device="gpu:0" or device="cpu"
Changes:
- Updated get_ocr_engine() to use device parameter
- Updated get_structure_engine() to use device parameter
- GPU mode: device="gpu:{gpu_device_id}"
- CPU mode: device="cpu"
This fixes the "ValueError: Unknown argument: gpu_mem" runtime error.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>