Commit Graph

115 Commits

Author SHA1 Message Date
egg
28e419f5fa proposal: migrate to external API authentication
Create OpenSpec proposal for migrating from local database authentication
to external API authentication using Microsoft Azure AD.

Changes proposed:
- Replace local username/password auth with external API
- Integrate with https://pj-auth-api.vercel.app/api/auth/login
- Use Azure AD tokens instead of local JWT
- Display user 'name' from API response in UI
- Maintain backward compatibility with feature flag

Benefits:
- Single Sign-On (SSO) capability
- Leverage enterprise identity management
- Reduce local user management overhead
- Consistent authentication across applications

Database changes:
- Add external_user_id for Azure AD user mapping
- Add display_name for UI display
- Keep existing schema for rollback capability

Implementation includes:
- Detailed migration plan with phased rollout
- Comprehensive task list for implementation
- Test script for API validation
- Risk assessment and mitigation strategies

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-14 15:14:48 +08:00
egg
b048f2d640 fix: disable chart recognition due to PaddlePaddle 3.0.0 API limitation
PaddleOCR-VL chart recognition model requires `fused_rms_norm_ext` API
which is not available in PaddlePaddle 3.0.0 stable release.

Changes:
- Set use_chart_recognition=False in PP-StructureV3 initialization
- Remove unsupported show_log parameter from PaddleOCR 3.x API calls
- Document known limitation in openspec proposal
- Add limitation documentation to README
- Update tasks.md with documentation task for known issues

Impact:
- Layout analysis still detects/extracts charts as images ✓
- Tables, formulas, and text recognition work normally ✓
- Deep chart understanding (type detection, data extraction) disabled ✗
- Chart to structured data conversion disabled ✗

Workaround: Charts saved as image files for manual review

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-14 13:16:17 +08:00
egg
80c091b89a fix: add PaddlePaddle 2.x/3.x API compatibility layer
PaddlePaddle 3.0.0b2 has "Illegal instruction" error on current CPU.
Downgrade to stable 2.6.2 which works but uses different API.

Changes:
- Auto-detect PaddlePaddle version at runtime
- Use 'device' parameter for 3.x (device="gpu:0" or "cpu")
- Use 'use_gpu' + 'gpu_mem' parameters for 2.x
- Apply to both get_ocr_engine() and get_structure_engine()
- Log PaddlePaddle version in initialization messages

Current setup:
- paddlepaddle-gpu==2.6.2 (stable, CUDA compiled)
- paddleocr==3.3.1
- paddlex==3.3.9

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-14 10:56:29 +08:00
egg
36944117f4 fix: update setup script to install PaddlePaddle GPU version from official source
Changes to setup_dev_env.sh:
- Add support for CUDA 13.x (install CUDA 12.x compatible version)
- Use official PaddlePaddle source for GPU versions
- Install paddlepaddle-gpu==3.0.0b2 from official index
- CUDA 13.x: use cu123 package (backward compatible)
- CUDA 12.x: use cu123 package
- CUDA 11.7+: use cu118 package
- CUDA 11.2-11.6: use cu117 package

Changes to requirements.txt:
- Comment out paddlepaddle dependency
- Let setup script handle GPU/CPU version installation

This fixes the issue where pip installed CPU-only paddlepaddle 3.2.1
instead of GPU version, causing GPU acceleration to be unavailable.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-14 09:35:12 +08:00
egg
d80d60f14b fix: update PaddleOCR 3.x API - replace deprecated gpu_mem parameter with device parameter
PaddleOCR 3.x changed the API:
- Removed: use_gpu=True/False and gpu_mem=<value>
- Added: device="gpu:0" or device="cpu"

Changes:
- Updated get_ocr_engine() to use device parameter
- Updated get_structure_engine() to use device parameter
- GPU mode: device="gpu:{gpu_device_id}"
- CPU mode: device="cpu"

This fixes the "ValueError: Unknown argument: gpu_mem" runtime error.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-14 09:22:56 +08:00
egg
7536f43513 feat: implement GPU acceleration support for OCR processing
實作 GPU 加速支援,自動偵測並啟用 CUDA GPU 加速 OCR 處理

主要變更:

1. 環境設置增強 (setup_dev_env.sh)
   - 新增 GPU 和 CUDA 版本偵測功能
   - 自動安裝對應的 PaddlePaddle GPU/CPU 版本
   - CUDA 11.2+ 安裝 GPU 版本,否則安裝 CPU 版本
   - 安裝後驗證 GPU 可用性並顯示設備資訊

2. 配置更新
   - .env.local: 加入 GPU 配置選項
     * FORCE_CPU_MODE: 強制 CPU 模式選項
     * GPU_MEMORY_FRACTION: GPU 記憶體使用比例
     * GPU_DEVICE_ID: GPU 裝置 ID
   - backend/app/core/config.py: 加入 GPU 配置欄位

3. OCR 服務 GPU 整合 (backend/app/services/ocr_service.py)
   - 新增 _detect_and_configure_gpu() 方法自動偵測 GPU
   - 新增 get_gpu_status() 方法回報 GPU 狀態和記憶體使用
   - 修改 get_ocr_engine() 支援 GPU 參數和錯誤降級
   - 修改 get_structure_engine() 支援 GPU 參數和錯誤降級
   - 自動 GPU/CPU 切換,GPU 失敗時自動降級到 CPU

4. 健康檢查與監控 (backend/app/main.py)
   - /health endpoint 加入 GPU 狀態資訊
   - 回報 GPU 可用性、裝置名稱、記憶體使用等資訊

5. 文檔更新 (README.md)
   - Features: 加入 GPU 加速功能說明
   - Prerequisites: 加入 GPU 硬體要求(可選)
   - Quick Start: 更新自動化設置說明包含 GPU 偵測
   - Configuration: 加入 GPU 配置選項和說明
   - Notes: 加入 GPU 支援注意事項

技術特性:
- 自動偵測 NVIDIA GPU 和 CUDA 版本
- 支援 CUDA 11.2-12.x
- GPU 初始化失敗時優雅降級到 CPU
- GPU 記憶體分配控制防止 OOM
- 即時 GPU 狀態監控和報告
- 完全向後相容 CPU-only 環境

預期效能:
- GPU 系統: 3-10x OCR 處理速度提升
- CPU 系統: 無影響,維持現有效能

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-14 07:42:13 +08:00
egg
6452797abe feat: add GPU acceleration support OpenSpec proposal
新增 GPU 加速支援的 OpenSpec 變更提案

主要內容:
- 在環境建置腳本中加入 GPU 偵測功能
- 自動安裝對應 CUDA 版本的 PaddlePaddle GPU 套件
- 在 OCR 處理程式中加入 GPU 可用性偵測
- 自動啟用 GPU 加速(可用時)或使用 CPU(不可用時)
- 支援強制 CPU 模式選項
- 加入 GPU 狀態報告到健康檢查 API

變更範圍:
- 新增 capability: environment-setup (環境設置)
- 修改 capability: ocr-processing (加入 GPU 支援)

實作任務包含:
1. 環境設置腳本增強 (GPU 偵測、CUDA 安裝)
2. 配置更新 (GPU 相關環境變數)
3. OCR 服務 GPU 整合 (自動偵測、記憶體管理)
4. 健康檢查與監控 (GPU 狀態報告)
5. 文檔更新
6. 測試與效能評估
7. 錯誤處理與邊界情況

預期效果:
- GPU 系統: 3-10x OCR 處理速度提升
- CPU 系統: 無影響,向後相容
- 自動硬體偵測與優化配置

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-14 07:34:06 +08:00
egg
d7e64737b7 feat: migrate to WSL Ubuntu native development environment
從 Docker/macOS+Conda 部署遷移到 WSL2 Ubuntu 原生開發環境

主要變更:
- 移除所有 Docker 相關配置檔案 (Dockerfile, docker-compose.yml, .dockerignore 等)
- 移除 macOS/Conda 設置腳本 (SETUP.md, setup_conda.sh)
- 新增 WSL Ubuntu 自動化環境設置腳本 (setup_dev_env.sh)
- 新增後端/前端快速啟動腳本 (start_backend.sh, start_frontend.sh)
- 統一開發端口配置 (backend: 8000, frontend: 5173)
- 改進資料庫連接穩定性(連接池、超時設置、重試機制)
- 更新專案文檔以反映當前 WSL 開發環境

Technical improvements:
- Database connection pooling with health checks and auto-reconnection
- Retry logic for long-running OCR tasks to prevent DB timeouts
- Extended JWT token expiration to 24 hours
- Support for Office documents (pptx, docx) via LibreOffice headless
- Comprehensive system dependency installation in single script

Environment:
- OS: WSL2 Ubuntu 24.04
- Python: 3.12 (venv)
- Node.js: 24.x LTS (nvm)
- Backend Port: 8000
- Frontend Port: 5173

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 21:00:42 +08:00
beabigegg
0f81d5e70b feat: Docker化部署 - 單容器架構轉換
將 Tool_OCR 從 macOS conda 環境轉換為 Docker 單容器部署方案。
前後端整合於同一容器,通過 Nginx 反向代理,僅對外暴露單一端口。

## 新增功能
- Docker 單容器架構(Frontend + Backend + Nginx)
- 多階段構建優化鏡像大小
- Supervisor 進程管理
- 健康檢查機制
- 完整部署文檔

## 技術細節
- 對外端口:12015(原 12010 已被佔用)
- 內部架構:Nginx(12015) → FastAPI(8000)
- 前端靜態文件由 Nginx 直接服務
- API 請求通過 Nginx 反向代理

## 系統依賴完善
- libmagic1:文件類型檢測
- LibreOffice:Office 文檔轉換
- paddlex[ocr]:PP-StructureV3 版面分析
- 中日韓字體支援

## 配置調整
- 環境變數路徑:macOS 路徑 → 容器絕對路徑
- 前端 API URL:修正為統一端口 12015
- Pip 安裝:延長超時至 600 秒,重試 5 次
- CRLF 轉換:自動處理 Windows 換行符

## 清理
- 移除臨時文檔(API_FIX_SUMMARY.md 等 7 個文檔)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 13:12:59 +08:00
beabigegg
57cf91271c feat: modernize frontend UI with Tailwind v4 and professional design system
BREAKING CHANGE: Migrated to Tailwind CSS v4 configuration system

Key Changes:
- Migrated from Tailwind v3 to v4 configuration system
  - Removed tailwind.config.js (incompatible with v4)
  - Updated index.css with @theme directive and oklch color space
  - Defined all custom animations directly in CSS using @keyframes

- Redesigned LoginPage with modern, enterprise-grade UI:
  - Full-screen gradient background (blue → purple → pink)
  - Floating animated orbs with blur effects
  - Glass morphism white card with backdrop-blur
  - Gradient buttons with shadow effects
  - 7 custom animations: fade-in, slide-in-right, slide-in-left, scale-in, shimmer, pulse, float

- Added shadcn/ui components:
  - alert.tsx, dialog.tsx, input.tsx, label.tsx, select.tsx, tabs.tsx

- Updated dependencies:
  - Added class-variance-authority ^0.7.0
  - Added react-markdown ^9.0.1

- Updated frontend documentation:
  - Comprehensive README.md with feature list, tech stack, project structure
  - Quick start guide and deployment instructions

Technical Details:
- Tailwind v4 uses @import "tailwindcss" instead of @tailwind directives
- All theme customization now in @theme block with CSS variables
- Color system migrated to oklch for better perceptual uniformity
- Animation definitions moved from config to CSS @layer utilities

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 08:55:01 +08:00
beabigegg
9cf36d8e21 fix: resolve 7 frontend-backend API inconsistencies and add comprehensive documentation
- Fixed LoginResponse: added expires_in field
- Renamed format to file_format across FileInfo interface
- Updated ProcessRequest: replaced confidence_threshold with detect_layout
- Modified ProcessResponse: added message and total_files, removed task_id
- Removed non-existent getTaskStatus API call
- Fixed getOCRResult parameter from taskId to fileId
- Commented out translation config APIs pending backend implementation

Documentation:
- Added API_REFERENCE.md: Complete API endpoint inventory
- Added API_FIX_SUMMARY.md: Detailed before/after comparison of all fixes
- Added FRONTEND_API.md: Frontend integration documentation
- Added FRONTEND_QUICK_START.md: Quick start guide
- Added FRONTEND_UPGRADE_SUMMARY.md: Upgrade summary

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 08:54:37 +08:00
beabigegg
fed112656f update FRONTEND documentation 2025-11-12 23:55:21 +08:00
beabigegg
21bc2f92f1 feat: modernize frontend architecture with professional UI/UX design
Complete redesign of frontend interface with focus on usability, visual hierarchy, and professional appearance:

**Design System:**
- Implemented clean blue color theme (#3B82F6) with professional palette
- Created consistent spacing, shadows, and typography system
- Added reusable utility classes (page-header, section, status-badge-*)
- Removed excessive gradients and decorative effects

**Layout Architecture:**
- Redesigned main layout with 256px sidebar navigation
- Sidebar includes logo, navigation with descriptions, and user profile
- Main content area with search bar and scrollable content
- Replaced horizontal navigation with vertical sidebar pattern

**Page Redesigns:**
1. LoginPage: Split-screen design with branding (left) and clean form (right)
   - Feature highlights with icons and statistics
   - Mobile responsive design
   - Professional gradient background with subtle pattern

2. UploadPage: Added 3-step visual progress indicator
   - Better file organization with summary and status badges
   - Clear action bar with confirmation message
   - Improved file list presentation

3. ProcessingPage: Enhanced progress visualization
   - Large progress bar with percentage display
   - 4-column stats grid (Completed, Processing, Failed, Total)
   - Clean file status list with processing times

4. ResultsPage: Improved 5-column layout (2 for list, 3 for preview)
   - Added stats cards for accuracy, processing time, and text blocks
   - Better preview panel with detailed metrics
   - Export and translate action buttons

5. ExportPage: Better organization with 2-column layout
   - Visual format selection with icons (TXT, JSON, Excel, Markdown, PDF)
   - Improved form controls and option organization
   - Sticky preview sidebar showing current configuration

**Component Updates:**
- Updated Button component with proper variants
- Enhanced Card component with hover effects
- Maintained FileUpload component functionality
- Added lucide-react for modern iconography

**Technical Improvements:**
- Fixed Tailwind CSS v4 compatibility issues with @apply
- Removed decorative animations in favor of functional ones
- Improved accessibility with proper labels and ARIA attributes
- Better color contrast and readability

This redesign transforms the interface from a basic layout to a professional, enterprise-ready application with clear visual hierarchy and excellent usability.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-12 23:54:44 +08:00
beabigegg
69302144f5 2nd 2025-11-12 22:54:56 +08:00
beabigegg
da700721fa first 2025-11-12 22:53:17 +08:00