egg/OCR - OCR

egg/OCR

Fork 0

Commit Graph

Author	SHA1	Message	Date
egg	c006905b6f	refactor: centralize DIFY settings in config.py and cleanup env files - Update config.py to read both .env and .env.local (with .env.local priority) - Move DIFY API settings from hardcoded values to environment configuration - Remove unused PADDLEOCR_MODEL_DIR setting (models stored in ~/.paddleocr/) - Remove deprecated argostranslate translation settings - Add DIFY settings: base_url, api_key, timeout, max_retries, batch limits - Update dify_client.py to use settings from config.py - Update translation_service.py to use settings instead of constants - Fix frontend env files to use correct variable name VITE_API_BASE_URL - Update setup_dev_env.sh with correct PaddlePaddle version (3.2.0) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-02 17:50:47 +08:00
egg	d7f7166a2d	feat: unify environment scripts with start.sh - Add unified start.sh script with subcommands (all/backend/frontend) - Add process management (--stop, --status) - Remove separate start_backend.sh and start_frontend.sh - Update setup_dev_env.sh with pre-flight checks and --cpu-only/--skip-db options - Update .env.example to remove sensitive data and add DIFY translation config - Add .pid/ to .gitignore for process management 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-02 12:48:52 +08:00
egg	fa1abcd8e6	feat: implement layout-preserving PDF generation with table reconstruction Major Features: - Add PDF generation service with Chinese font support - Parse HTML tables from PP-StructureV3 and rebuild with ReportLab - Extract table text for translation purposes - Auto-filter text regions inside tables to avoid overlaps Backend Changes: 1. pdf_generator_service.py (NEW) - HTMLTableParser: Parse HTML tables to extract structure - PDFGeneratorService: Generate layout-preserving PDFs - Coordinate transformation: OCR (top-left) → PDF (bottom-left) - Font size heuristics: 75% of bbox height with width checking - Table reconstruction: Parse HTML → ReportLab Table - Image embedding: Extract bbox from filenames 2. ocr_service.py - Add _extract_table_text() for translation support - Add output_dir parameter to save images to result directory - Extract bbox from image filenames (img_in_table_box_x1_y1_x2_y2.jpg) 3. tasks.py - Update process_task_ocr to use save_results() with PDF generation - Fix download_pdf endpoint to use database-stored PDF paths - Support on-demand PDF generation from JSON 4. config.py - Add chinese_font_path configuration - Add pdf_enable_bbox_debug flag Frontend Changes: 1. PDFViewer.tsx (NEW) - React PDF viewer with zoom and pagination - Memoized file config to prevent unnecessary reloads 2. TaskDetailPage.tsx & ResultsPage.tsx - Integrate PDF preview and download 3. main.tsx - Configure PDF.js worker via CDN 4. vite.config.ts - Add host: '0.0.0.0' for network access - Use VITE_API_URL environment variable for backend proxy Dependencies: - reportlab: PDF generation library - Noto Sans SC font: Chinese character support 🤖 Generated with Claude Code https://claude.com/claude-code Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-17 20:21:56 +08:00

Author

SHA1

Message

Date

egg

c006905b6f

refactor: centralize DIFY settings in config.py and cleanup env files

- Update config.py to read both .env and .env.local (with .env.local priority)
- Move DIFY API settings from hardcoded values to environment configuration
- Remove unused PADDLEOCR_MODEL_DIR setting (models stored in ~/.paddleocr/)
- Remove deprecated argostranslate translation settings
- Add DIFY settings: base_url, api_key, timeout, max_retries, batch limits
- Update dify_client.py to use settings from config.py
- Update translation_service.py to use settings instead of constants
- Fix frontend env files to use correct variable name VITE_API_BASE_URL
- Update setup_dev_env.sh with correct PaddlePaddle version (3.2.0)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-12-02 17:50:47 +08:00

egg

d7f7166a2d

feat: unify environment scripts with start.sh

- Add unified start.sh script with subcommands (all/backend/frontend)
- Add process management (--stop, --status)
- Remove separate start_backend.sh and start_frontend.sh
- Update setup_dev_env.sh with pre-flight checks and --cpu-only/--skip-db options
- Update .env.example to remove sensitive data and add DIFY translation config
- Add .pid/ to .gitignore for process management

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-12-02 12:48:52 +08:00

egg

fa1abcd8e6

feat: implement layout-preserving PDF generation with table reconstruction

Major Features:
- Add PDF generation service with Chinese font support
- Parse HTML tables from PP-StructureV3 and rebuild with ReportLab
- Extract table text for translation purposes
- Auto-filter text regions inside tables to avoid overlaps

Backend Changes:
1. pdf_generator_service.py (NEW)
   - HTMLTableParser: Parse HTML tables to extract structure
   - PDFGeneratorService: Generate layout-preserving PDFs
   - Coordinate transformation: OCR (top-left) → PDF (bottom-left)
   - Font size heuristics: 75% of bbox height with width checking
   - Table reconstruction: Parse HTML → ReportLab Table
   - Image embedding: Extract bbox from filenames

2. ocr_service.py
   - Add _extract_table_text() for translation support
   - Add output_dir parameter to save images to result directory
   - Extract bbox from image filenames (img_in_table_box_x1_y1_x2_y2.jpg)

3. tasks.py
   - Update process_task_ocr to use save_results() with PDF generation
   - Fix download_pdf endpoint to use database-stored PDF paths
   - Support on-demand PDF generation from JSON

4. config.py
   - Add chinese_font_path configuration
   - Add pdf_enable_bbox_debug flag

Frontend Changes:
1. PDFViewer.tsx (NEW)
   - React PDF viewer with zoom and pagination
   - Memoized file config to prevent unnecessary reloads

2. TaskDetailPage.tsx & ResultsPage.tsx
   - Integrate PDF preview and download

3. main.tsx
   - Configure PDF.js worker via CDN

4. vite.config.ts
   - Add host: '0.0.0.0' for network access
   - Use VITE_API_URL environment variable for backend proxy

Dependencies:
- reportlab: PDF generation library
- Noto Sans SC font: Chinese character support

🤖 Generated with Claude Code
https://claude.com/claude-code

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-17 20:21:56 +08:00

3 Commits