egg
715805b3b8
feat: implement table cell boxes extraction with SLANeXt
...
Phase 1-3 implementation of extract-table-cell-boxes proposal:
- Add enable_table_cell_boxes_extraction config option
- Implement lazy-loaded SLANeXt model caching in PPStructureEnhanced
- Add _extract_cell_boxes_with_slanet() method for direct model invocation
- Supplement PPStructureV3 table processing with SLANeXt cell boxes
- Add _compute_table_grid_from_cell_boxes() for column width calculation
- Modify draw_table_region() to use cell_boxes for accurate layout
Key features:
- Auto-detect table type (wired/wireless) using PP-LCNet classifier
- Convert 8-point polygon bbox to 4-point rectangle
- Graceful fallback to equal distribution when cell_boxes unavailable
- Proper coordinate transformation with scaling support
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-28 12:20:32 +08:00
egg
dda9621e17
feat: enhance layout preprocessing and unify image scaling proposal
...
Backend changes:
- Add image scaling configuration for PP-Structure processing
- Enhance layout preprocessing service with scaling support
- Update OCR service with improved memory management
- Add PP-Structure enhanced processing improvements
Frontend changes:
- Update preprocessing settings UI
- Fix processing page layout and state management
- Update API types for new parameters
Proposals:
- Archive add-layout-preprocessing proposal (completed)
- Add unify-image-scaling proposal for consistent coordinate handling
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-28 09:23:19 +08:00
egg
ea0dd7456c
feat: implement layout preprocessing backend
...
Backend implementation for add-layout-preprocessing proposal:
- Add LayoutPreprocessingService with CLAHE, sharpen, binarize
- Add auto-detection: analyze_image_quality() for contrast/edge metrics
- Integrate preprocessing into OCR pipeline (analyze_layout)
- Add Preview API: POST /api/v2/tasks/{id}/preview/preprocessing
- Add config options: layout_preprocessing_mode, thresholds
- Add schemas: PreprocessingConfig, PreprocessingPreviewResponse
Preprocessing only affects layout detection input.
Original images preserved for element extraction.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-27 15:17:20 +08:00
egg
6235280c45
feat: upgrade PP-StructureV3 models to latest versions
...
- Layout: PP-DocLayout-S → PP-DocLayout_plus-L (83.2% mAP)
- Table: Single model → Dual SLANeXt (wired/wireless)
- Formula: PP-FormulaNet_plus-L for enhanced recognition
- Add preprocessing flags support (orientation, unwarping)
- Update frontend i18n descriptions
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-27 14:22:06 +08:00
egg
59206a6ab8
feat: simplify layout model selection and archive proposals
...
Changes:
- Replace PP-Structure 7-slider parameter UI with simple 3-option layout model selector
- Add layout model mapping: chinese (PP-DocLayout-S), default (PubLayNet), cdla
- Add LayoutModelSelector component and zh-TW translations
- Fix "default" model behavior with sentinel value for PubLayNet
- Add gap filling service for OCR track coverage improvement
- Add PP-Structure debug utilities
- Archive completed/incomplete proposals:
- add-ocr-track-gap-filling (complete)
- fix-ocr-track-table-rendering (incomplete)
- simplify-ppstructure-model-selection (22/25 tasks)
- Add new layout model tests, archive old PP-Structure param tests
- Update OpenSpec ocr-processing spec with layout model requirements
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-27 13:27:00 +08:00
egg
1afdb822c3
feat: implement hybrid image extraction and memory management
...
Backend:
- Add hybrid image extraction for Direct track (inline image blocks)
- Add render_inline_image_regions() fallback when OCR doesn't find images
- Add check_document_for_missing_images() for detecting missing images
- Add memory management system (MemoryGuard, ModelManager, ServicePool)
- Update pdf_generator_service to handle HYBRID processing track
- Add ElementType.LOGO for logo extraction
Frontend:
- Fix PDF viewer re-rendering issues with memoization
- Add TaskNotFound component and useTaskValidation hook
- Disable StrictMode due to react-pdf incompatibility
- Fix task detail and results page loading states
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-26 10:56:22 +08:00
egg
a659e7ae00
fix: improve PP-StructureV3 structure preservation for complex diagrams
...
- Fix parsing_res_list field mapping (block_label, block_content, block_bbox)
- Add fine-grained PP-StructureV3 configuration parameters
- Lower detection thresholds (0.5→0.2) for more sensitive element detection
- Use 'small' merge mode instead of default to minimize bbox merging
- Add layout_nms, unclip_ratio, text_det thresholds for better control
- Result: Doubled element detection from 6 to 12 elements on complex diagrams
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-25 08:53:37 +08:00
egg
8b9a364452
feat: add GPU optimization and fix TableData consistency
...
GPU Optimization (Section 3.1):
- Add comprehensive memory management for RTX 4060 8GB
- Enable all recognition features (chart, formula, table, seal, text)
- Implement model cache with auto-unload for idle models
- Add memory monitoring and warning system
Bug Fix (Section 3.3):
- Fix TableData field inconsistency: 'columns' -> 'cols'
- Remove invalid 'html' and 'extracted_text' parameters
- Add proper TableCell conversion in _convert_table_data
Documentation:
- Add Future Improvements section for batch processing enhancement
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-19 09:17:27 +08:00
egg
fa1abcd8e6
feat: implement layout-preserving PDF generation with table reconstruction
...
Major Features:
- Add PDF generation service with Chinese font support
- Parse HTML tables from PP-StructureV3 and rebuild with ReportLab
- Extract table text for translation purposes
- Auto-filter text regions inside tables to avoid overlaps
Backend Changes:
1. pdf_generator_service.py (NEW)
- HTMLTableParser: Parse HTML tables to extract structure
- PDFGeneratorService: Generate layout-preserving PDFs
- Coordinate transformation: OCR (top-left) → PDF (bottom-left)
- Font size heuristics: 75% of bbox height with width checking
- Table reconstruction: Parse HTML → ReportLab Table
- Image embedding: Extract bbox from filenames
2. ocr_service.py
- Add _extract_table_text() for translation support
- Add output_dir parameter to save images to result directory
- Extract bbox from image filenames (img_in_table_box_x1_y1_x2_y2.jpg)
3. tasks.py
- Update process_task_ocr to use save_results() with PDF generation
- Fix download_pdf endpoint to use database-stored PDF paths
- Support on-demand PDF generation from JSON
4. config.py
- Add chinese_font_path configuration
- Add pdf_enable_bbox_debug flag
Frontend Changes:
1. PDFViewer.tsx (NEW)
- React PDF viewer with zoom and pagination
- Memoized file config to prevent unnecessary reloads
2. TaskDetailPage.tsx & ResultsPage.tsx
- Integrate PDF preview and download
3. main.tsx
- Configure PDF.js worker via CDN
4. vite.config.ts
- Add host: '0.0.0.0' for network access
- Use VITE_API_URL environment variable for backend proxy
Dependencies:
- reportlab: PDF generation library
- Noto Sans SC font: Chinese character support
🤖 Generated with Claude Code
https://claude.com/claude-code
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-17 20:21:56 +08:00
egg
62609de57c
fix: add result_dir configuration for task result storage
...
Changes:
- Add result_dir field to Settings class (default: ./storage/results)
- Add result_dir to ensure_directories() method
Fixes:
- AttributeError: 'Settings' object has no attribute 'result_dir'
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-16 19:52:26 +08:00
egg
ad2b832fb6
feat: complete external auth V2 migration with advanced features
...
This commit implements comprehensive external Azure AD authentication
with complete task management, file download, and admin monitoring systems.
## Core Features Implemented (80% Complete)
### 1. Token Auto-Refresh Mechanism ✅
- Backend: POST /api/v2/auth/refresh endpoint
- Frontend: Auto-refresh 5 minutes before expiration
- Auto-retry on 401 errors with seamless token refresh
### 2. File Download System ✅
- Three format support: JSON / Markdown / PDF
- Endpoints: GET /api/v2/tasks/{id}/download/{format}
- File access control with ownership validation
- Frontend download buttons in TaskHistoryPage
### 3. Complete Task Management ✅
Backend Endpoints:
- POST /api/v2/tasks/{id}/start - Start task
- POST /api/v2/tasks/{id}/cancel - Cancel task
- POST /api/v2/tasks/{id}/retry - Retry failed task
- GET /api/v2/tasks - List with filters (status, filename, date range)
- GET /api/v2/tasks/stats - User statistics
Frontend Features:
- Status-based action buttons (Start/Cancel/Retry)
- Advanced search and filtering (status, filename, date range)
- Pagination and sorting
- Task statistics dashboard (5 stat cards)
### 4. Admin Monitoring System ✅ (Backend)
Admin APIs:
- GET /api/v2/admin/stats - System statistics
- GET /api/v2/admin/users - User list with stats
- GET /api/v2/admin/users/top - User leaderboard
- GET /api/v2/admin/audit-logs - Audit log query system
- GET /api/v2/admin/audit-logs/user/{id}/summary
Admin Features:
- Email-based admin check (ymirliu@panjit.com.tw )
- Comprehensive system metrics (users, tasks, sessions, activity)
- Audit logging service for security tracking
### 5. User Isolation & Security ✅
- Row-level security on all task queries
- File access control with ownership validation
- Strict user_id filtering on all operations
- Session validation and expiry checking
- Admin privilege verification
## New Files Created
Backend:
- backend/app/models/user_v2.py - User model for external auth
- backend/app/models/task.py - Task model with user isolation
- backend/app/models/session.py - Session management
- backend/app/models/audit_log.py - Audit log model
- backend/app/services/external_auth_service.py - External API client
- backend/app/services/task_service.py - Task CRUD with isolation
- backend/app/services/file_access_service.py - File access control
- backend/app/services/admin_service.py - Admin operations
- backend/app/services/audit_service.py - Audit logging
- backend/app/routers/auth_v2.py - V2 auth endpoints
- backend/app/routers/tasks.py - Task management endpoints
- backend/app/routers/admin.py - Admin endpoints
- backend/alembic/versions/5e75a59fb763_*.py - DB migration
Frontend:
- frontend/src/services/apiV2.ts - Complete V2 API client
- frontend/src/types/apiV2.ts - V2 type definitions
- frontend/src/pages/TaskHistoryPage.tsx - Task history UI
Modified Files:
- backend/app/core/deps.py - Added get_current_admin_user_v2
- backend/app/main.py - Registered admin router
- frontend/src/pages/LoginPage.tsx - V2 login integration
- frontend/src/components/Layout.tsx - User display and logout
- frontend/src/App.tsx - Added /tasks route
## Documentation
- openspec/changes/.../PROGRESS_UPDATE.md - Detailed progress report
## Pending Items (20%)
1. Database migration execution for audit_logs table
2. Frontend admin dashboard page
3. Frontend audit log viewer
## Testing Status
- Manual testing: ✅ Authentication flow verified
- Unit tests: ⏳ Pending
- Integration tests: ⏳ Pending
## Security Enhancements
- ✅ User isolation (row-level security)
- ✅ File access control
- ✅ Token expiry validation
- ✅ Admin privilege verification
- ✅ Audit logging infrastructure
- ⏳ Token encryption (noted, low priority)
- ⏳ Rate limiting (noted, low priority)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-14 17:19:43 +08:00
egg
7536f43513
feat: implement GPU acceleration support for OCR processing
...
實作 GPU 加速支援,自動偵測並啟用 CUDA GPU 加速 OCR 處理
主要變更:
1. 環境設置增強 (setup_dev_env.sh)
- 新增 GPU 和 CUDA 版本偵測功能
- 自動安裝對應的 PaddlePaddle GPU/CPU 版本
- CUDA 11.2+ 安裝 GPU 版本,否則安裝 CPU 版本
- 安裝後驗證 GPU 可用性並顯示設備資訊
2. 配置更新
- .env.local: 加入 GPU 配置選項
* FORCE_CPU_MODE: 強制 CPU 模式選項
* GPU_MEMORY_FRACTION: GPU 記憶體使用比例
* GPU_DEVICE_ID: GPU 裝置 ID
- backend/app/core/config.py: 加入 GPU 配置欄位
3. OCR 服務 GPU 整合 (backend/app/services/ocr_service.py)
- 新增 _detect_and_configure_gpu() 方法自動偵測 GPU
- 新增 get_gpu_status() 方法回報 GPU 狀態和記憶體使用
- 修改 get_ocr_engine() 支援 GPU 參數和錯誤降級
- 修改 get_structure_engine() 支援 GPU 參數和錯誤降級
- 自動 GPU/CPU 切換,GPU 失敗時自動降級到 CPU
4. 健康檢查與監控 (backend/app/main.py)
- /health endpoint 加入 GPU 狀態資訊
- 回報 GPU 可用性、裝置名稱、記憶體使用等資訊
5. 文檔更新 (README.md)
- Features: 加入 GPU 加速功能說明
- Prerequisites: 加入 GPU 硬體要求(可選)
- Quick Start: 更新自動化設置說明包含 GPU 偵測
- Configuration: 加入 GPU 配置選項和說明
- Notes: 加入 GPU 支援注意事項
技術特性:
- 自動偵測 NVIDIA GPU 和 CUDA 版本
- 支援 CUDA 11.2-12.x
- GPU 初始化失敗時優雅降級到 CPU
- GPU 記憶體分配控制防止 OOM
- 即時 GPU 狀態監控和報告
- 完全向後相容 CPU-only 環境
預期效能:
- GPU 系統: 3-10x OCR 處理速度提升
- CPU 系統: 無影響,維持現有效能
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-14 07:42:13 +08:00
beabigegg
da700721fa
first
2025-11-12 22:53:17 +08:00