chore: project cleanup and prepare for dual-track processing refactor

- Removed all test files and directories
- Deleted outdated documentation (will be rewritten)
- Cleaned up temporary files, logs, and uploads
- Archived 5 completed OpenSpec proposals
- Created new dual-track-document-processing proposal with complete OpenSpec structure
  - Dual-track architecture: OCR track (PaddleOCR) + Direct track (PyMuPDF)
  - UnifiedDocument model for consistent output
  - Support for structure-preserving translation
- Updated .gitignore to prevent future test/temp files

This is a major cleanup preparing for the complete refactoring of the document processing pipeline.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
egg
2025-11-18 20:02:31 +08:00
parent 0edc56b03f
commit cd3cbea49d
64 changed files with 3573 additions and 8190 deletions

View File

@@ -0,0 +1,519 @@
# 前端實作完成 - External Authentication & Task History
## 實作日期
2025-11-14
## 狀態
**前端核心功能完成**
- V2 認證服務整合
- 登入頁面更新
- 任務歷史頁面
- 導航整合
---
## 📋 已完成項目
### 1. V2 API 服務層 ✅
#### **檔案:`frontend/src/services/apiV2.ts`**
**核心功能:**
```typescript
class ApiClientV2 {
// 認證管理
async login(data: LoginRequest): Promise<LoginResponseV2>
async logout(sessionId?: number): Promise<void>
async getMe(): Promise<UserInfo>
async listSessions(): Promise<SessionInfo[]>
// 任務管理
async createTask(data: TaskCreate): Promise<Task>
async listTasks(params): Promise<TaskListResponse>
async getTaskStats(): Promise<TaskStats>
async getTask(taskId: string): Promise<TaskDetail>
async updateTask(taskId: string, data: TaskUpdate): Promise<Task>
async deleteTask(taskId: string): Promise<void>
// 輔助方法
async downloadTaskFile(url: string, filename: string): Promise<void>
}
```
**特色:**
- 自動 token 管理localStorage
- 401 自動重定向到登入
- Session 過期檢測
- 用戶資訊快取
#### **檔案:`frontend/src/types/apiV2.ts`**
完整類型定義:
- `UserInfo`, `LoginResponseV2`, `SessionInfo`
- `Task`, `TaskCreate`, `TaskUpdate`, `TaskDetail`
- `TaskStats`, `TaskListResponse`, `TaskFilters`
- `TaskStatus` 枚舉
---
### 2. 登入頁面更新 ✅
#### **檔案:`frontend/src/pages/LoginPage.tsx`**
**變更:**
```typescript
// 舊版V1
await apiClient.login({ username, password })
setUser({ id: 1, username })
// 新版V2
const response = await apiClientV2.login({ username, password })
setUser({
id: response.user.id,
username: response.user.email,
email: response.user.email,
displayName: response.user.display_name
})
```
**功能:**
- ✅ 整合外部 Azure AD 認證
- ✅ 顯示用戶顯示名稱
- ✅ 錯誤訊息處理
- ✅ 保持原有 UI 設計
---
### 3. 任務歷史頁面 ✅
#### **檔案:`frontend/src/pages/TaskHistoryPage.tsx`**
**核心功能:**
1. **統計儀表板**
- 總計、待處理、處理中、已完成、失敗
- 卡片式呈現
- 即時更新
2. **篩選功能**
- 按狀態篩選(全部/pending/processing/completed/failed
- 未來可擴展:日期範圍、檔名搜尋
3. **任務列表**
- 分頁顯示(每頁 20 筆)
- 欄位:檔案名稱、狀態、建立時間、完成時間、處理時間
- 操作:查看詳情、刪除
4. **狀態徽章**
```typescript
pending → 灰色 + 時鐘圖標
processing → 藍色 + 旋轉圖標
completed → 綠色 + 勾選圖標
failed → 紅色 + X 圖標
```
5. **分頁控制**
- 上一頁/下一頁
- 顯示當前範圍1-20 / 共 45 個)
- 自動禁用按鈕
**UI 組件使用:**
- `Card` - 統計卡片和主容器
- `Table` - 任務列表表格
- `Badge` - 狀態標籤
- `Button` - 操作按鈕
- `Select` - 狀態篩選下拉選單
---
### 4. 路由整合 ✅
#### **檔案:`frontend/src/App.tsx`**
新增路由:
```typescript
<Route path="tasks" element={<TaskHistoryPage />} />
```
**路由結構:**
```
/login - 登入頁面(公開)
/ - 根路徑(重定向到 /upload
/upload - 上傳檔案
/processing - 處理進度
/results - 查看結果
/tasks - 任務歷史 (NEW!)
/export - 導出文件
/settings - 系統設定
```
---
### 5. 導航更新 ✅
#### **檔案:`frontend/src/components/Layout.tsx`**
**新增導航項:**
```typescript
{
to: '/tasks',
label: '任務歷史',
icon: History,
description: '查看任務記錄'
}
```
**Logout 邏輯更新:**
```typescript
const handleLogout = async () => {
try {
// 優先使用 V2 API
if (apiClientV2.isAuthenticated()) {
await apiClientV2.logout()
} else {
apiClient.logout()
}
} finally {
logout() // 清除本地狀態
}
}
```
**用戶資訊顯示:**
- 顯示名稱:`user.displayName || user.username`
- Email`user.email || user.username`
- 頭像:首字母大寫
---
### 6. 類型擴展 ✅
#### **檔案:`frontend/src/types/api.ts`**
擴展 User 介面:
```typescript
export interface User {
id: number
username: string
email?: string // NEW
displayName?: string | null // NEW
}
```
---
## 🎨 UI/UX 特色
### 任務歷史頁面設計亮點:
1. **響應式卡片佈局**
- Grid 5 欄(桌面)/ 1 欄(手機)
- 統計數據卡片 hover 效果
2. **清晰的狀態視覺化**
- 彩色徽章
- 動畫圖標processing 狀態旋轉)
- 語意化顏色
3. **操作反饋**
- 載入動畫Loader2
- 空狀態提示
- 錯誤警告
4. **用戶友好**
- 確認刪除對話框
- 刷新按鈕
- 分頁資訊明確
---
## 🔄 向後兼容
### V1 與 V2 並存策略
**認證服務:**
- V1: `apiClient` (原有本地認證)
- V2: `apiClientV2` (新外部認證)
**登入流程:**
- 新用戶使用 V2 API 登入
- 舊 session 仍可使用 V1 API
**Logout 處理:**
```typescript
if (apiClientV2.isAuthenticated()) {
await apiClientV2.logout() // 呼叫後端 /api/v2/auth/logout
} else {
apiClient.logout() // 僅清除本地 token
}
```
---
## 📱 使用流程
### 1. 登入
```
用戶訪問 /login
→ 輸入 email + password
→ apiClientV2.login() 呼叫外部 API
→ 接收 access_token + user info
→ 存入 localStorage
→ 重定向到 /upload
```
### 2. 查看任務歷史
```
用戶點擊「任務歷史」導航
→ 訪問 /tasks
→ apiClientV2.listTasks() 獲取任務列表
→ apiClientV2.getTaskStats() 獲取統計
→ 顯示任務表格 + 統計卡片
```
### 3. 篩選任務
```
用戶選擇狀態篩選器completed
→ setStatusFilter('completed')
→ useEffect 觸發重新 fetchTasks()
→ 呼叫 apiClientV2.listTasks({ status: 'completed' })
→ 更新任務列表
```
### 4. 刪除任務
```
用戶點擊刪除按鈕
→ 確認對話框
→ apiClientV2.deleteTask(taskId)
→ 重新載入任務列表和統計
```
### 5. 分頁導航
```
用戶點擊「下一頁」
→ setPage(page + 1)
→ useEffect 觸發 fetchTasks()
→ 呼叫 listTasks({ page: 2 })
→ 更新任務列表
```
---
## 🧪 測試指南
### 手動測試步驟:
#### 1. 測試登入
```bash
# 啟動後端
cd backend
source venv/bin/activate
python -m app.main
# 啟動前端
cd frontend
npm run dev
# 訪問 http://localhost:5173/login
# 輸入 Azure AD 憑證
# 確認登入成功並顯示用戶名稱
```
#### 2. 測試任務歷史
```bash
# 登入後點擊側邊欄「任務歷史」
# 確認統計卡片顯示正確數字
# 確認任務列表載入
# 測試狀態篩選
# 測試分頁功能
```
#### 3. 測試任務刪除
```bash
# 在任務列表點擊刪除按鈕
# 確認刪除確認對話框
# 確認刪除後列表更新
# 確認統計數字更新
```
#### 4. 測試 Logout
```bash
# 點擊側邊欄登出按鈕
# 確認清除 localStorage
# 確認重定向到登入頁面
# 再次登入確認一切正常
```
---
## 🔧 已知限制
### 目前未實作項目:
1. **任務詳情頁面** (`/tasks/:taskId`)
- 顯示完整任務資訊
- 下載結果檔案JSON/Markdown/PDF
- 查看任務文件列表
2. **進階篩選**
- 日期範圍選擇器
- 檔案名稱搜尋
- 多條件組合篩選
3. **批次操作**
- 批次刪除任務
- 批次下載結果
4. **即時更新**
- WebSocket 連接
- 任務狀態即時推送
- 自動刷新處理中的任務
5. **錯誤詳情**
- 展開查看 `error_message`
- 失敗任務重試功能
---
## 💡 未來擴展建議
### 短期優化1-2 週):
1. **任務詳情頁面**
```typescript
// frontend/src/pages/TaskDetailPage.tsx
const task = await apiClientV2.getTask(taskId)
// 顯示完整資訊 + 下載按鈕
```
2. **檔案下載**
```typescript
const handleDownload = async (path: string, filename: string) => {
await apiClientV2.downloadTaskFile(path, filename)
}
```
3. **日期範圍篩選**
```typescript
<DateRangePicker
from={dateFrom}
to={dateTo}
onChange={(range) => {
setDateFrom(range.from)
setDateTo(range.to)
}}
/>
```
### 中期功能1 個月):
4. **即時狀態更新**
- 使用 WebSocket 或 Server-Sent Events
- 自動更新 processing 任務狀態
5. **批次操作**
- 複選框選擇多個任務
- 批次刪除/下載
6. **搜尋功能**
- 檔案名稱模糊搜尋
- 全文搜尋(需後端支援)
### 長期規劃3 個月):
7. **任務視覺化**
- 時間軸視圖
- 甘特圖(處理進度)
- 統計圖表ECharts
8. **通知系統**
- 任務完成通知
- 錯誤警報
- 瀏覽器通知 API
9. **導出功能**
- 任務報表導出Excel/PDF
- 統計資料導出
---
## 📝 程式碼範例
### 在其他頁面使用 V2 API
```typescript
// Example: 在 UploadPage 創建任務
import { apiClientV2 } from '@/services/apiV2'
const handleUpload = async (file: File) => {
try {
// 創建任務
const task = await apiClientV2.createTask({
filename: file.name,
file_type: file.type
})
console.log('Task created:', task.task_id)
// TODO: 上傳檔案到雲端存儲
// TODO: 更新任務狀態為 processing
// TODO: 呼叫 OCR 服務
} catch (error) {
console.error('Upload failed:', error)
}
}
```
### 監聽任務狀態變化
```typescript
// Example: 輪詢任務狀態
const pollTaskStatus = async (taskId: string) => {
const interval = setInterval(async () => {
try {
const task = await apiClientV2.getTask(taskId)
if (task.status === 'completed') {
clearInterval(interval)
alert('任務完成!')
} else if (task.status === 'failed') {
clearInterval(interval)
alert(`任務失敗:${task.error_message}`)
}
} catch (error) {
clearInterval(interval)
console.error('Poll error:', error)
}
}, 5000) // 每 5 秒檢查一次
}
```
---
## ✅ 完成清單
- [x] V2 API 服務層(`apiV2.ts`
- [x] V2 類型定義(`apiV2.ts`
- [x] 登入頁面整合 V2
- [x] 任務歷史頁面
- [x] 統計儀表板
- [x] 狀態篩選
- [x] 分頁功能
- [x] 任務刪除
- [x] 路由整合
- [x] 導航更新
- [x] Logout 更新
- [x] 用戶資訊顯示
- [ ] 任務詳情頁面(待實作)
- [ ] 檔案下載(待實作)
- [ ] 即時狀態更新(待實作)
- [ ] 批次操作(待實作)
---
**實作完成日期**2025-11-14
**實作人員**Claude Code
**前端框架**React + TypeScript + Vite
**UI 庫**Tailwind CSS + shadcn/ui
**狀態管理**Zustand
**HTTP 客戶端**Axios

View File

@@ -0,0 +1,556 @@
# External API Authentication Implementation - Complete ✅
## 實作日期
2025-11-14
## 狀態
**後端實作完成** - Phase 1-8 已完成
**前端實作待續** - Phase 9-11 待實作
📋 **測試與文檔** - Phase 12-13 待完成
---
## 📋 已完成階段 (Phase 1-8)
### Phase 1: 資料庫架構設計 ✅
#### 創建的模型文件:
1. **`backend/app/models/user_v2.py`** - 新用戶模型
- 資料表:`tool_ocr_users`
- 欄位:`id`, `email`, `display_name`, `created_at`, `last_login`, `is_active`
- 特點無密碼欄位外部認證、email 作為主要識別
2. **`backend/app/models/task.py`** - 任務模型
- 資料表:`tool_ocr_tasks`, `tool_ocr_task_files`
- 任務狀態PENDING, PROCESSING, COMPLETED, FAILED
- 用戶隔離:外鍵關聯 `user_id`CASCADE 刪除
3. **`backend/app/models/session.py`** - Session 管理
- 資料表:`tool_ocr_sessions`
- 儲存access_token, id_token, refresh_token (加密)
- 追蹤expires_at, ip_address, user_agent, last_accessed_at
#### 資料庫遷移:
- **檔案**`backend/alembic/versions/5e75a59fb763_add_external_auth_schema_with_task_.py`
- **狀態**:已套用 (alembic stamp head)
- **變更**:創建 4 個新表 (users, sessions, tasks, task_files)
- **策略**:保留舊表,不刪除(避免外鍵約束錯誤)
---
### Phase 2: 配置管理 ✅
#### 環境變數 (`.env.local`):
```bash
# External Authentication
EXTERNAL_AUTH_API_URL=https://pj-auth-api.vercel.app
EXTERNAL_AUTH_ENDPOINT=/api/auth/login
EXTERNAL_AUTH_TIMEOUT=30
TOKEN_REFRESH_BUFFER=300
# Task Management
DATABASE_TABLE_PREFIX=tool_ocr_
ENABLE_TASK_HISTORY=true
TASK_RETENTION_DAYS=30
MAX_TASKS_PER_USER=1000
```
#### 配置類 (`backend/app/core/config.py`):
- 新增外部認證配置屬性
- 新增 `external_auth_full_url` property
- 新增任務管理配置參數
---
### Phase 3: 服務層實作 ✅
#### 1. 外部認證服務 (`backend/app/services/external_auth_service.py`)
**核心功能:**
```python
class ExternalAuthService:
async def authenticate_user(username, password) -> tuple[bool, AuthResponse, error]
# 呼叫外部 APIPOST https://pj-auth-api.vercel.app/api/auth/login
# 重試邏輯3 次,指數退避
# 返回success, auth_data (tokens + user_info), error_msg
async def validate_token(access_token) -> tuple[bool, payload]
# TODO: 完整 JWT 驗證(簽名、過期時間等)
def is_token_expiring_soon(expires_at) -> bool
# 檢查是否在 TOKEN_REFRESH_BUFFER 內過期
```
**錯誤處理:**
- HTTP 超時自動重試
- 5xx 錯誤指數退避
- 完整日誌記錄
#### 2. 任務管理服務 (`backend/app/services/task_service.py`)
**核心功能:**
```python
class TaskService:
# 創建與查詢
def create_task(db, user_id, filename, file_type) -> Task
def get_task_by_id(db, task_id, user_id) -> Task # 用戶隔離
def get_user_tasks(db, user_id, status, skip, limit) -> (tasks, total)
# 更新
def update_task_status(db, task_id, user_id, status, error, time_ms) -> Task
def update_task_results(db, task_id, user_id, paths...) -> Task
# 刪除與清理
def delete_task(db, task_id, user_id) -> bool
def auto_cleanup_expired_tasks(db) -> int # 根據 TASK_RETENTION_DAYS
# 統計
def get_user_stats(db, user_id) -> dict # 按狀態統計
```
**安全特性:**
- 所有查詢強制 `user_id` 過濾
- 自動任務限額檢查
- 過期任務自動清理
---
### Phase 4-6: API 端點實作 ✅
#### 1. 認證端點 (`backend/app/routers/auth_v2.py`)
**路由前綴**`/api/v2/auth`
| 端點 | 方法 | 描述 | 認證 |
|------|------|------|------|
| `/login` | POST | 外部 API 登入 | 無 |
| `/logout` | POST | 登出 (刪除 session) | 需要 |
| `/me` | GET | 獲取當前用戶資訊 | 需要 |
| `/sessions` | GET | 列出用戶所有 sessions | 需要 |
**Login 流程:**
```
1. 呼叫外部 API 認證
2. 獲取 access_token, id_token, user_info
3. 在資料庫中創建/更新用戶 (email)
4. 創建 session 記錄 (tokens, IP, user agent)
5. 生成內部 JWT (包含 user_id, session_id)
6. 返回內部 JWT 給前端
```
#### 2. 任務管理端點 (`backend/app/routers/tasks.py`)
**路由前綴**`/api/v2/tasks`
| 端點 | 方法 | 描述 | 認證 |
|------|------|------|------|
| `/` | POST | 創建新任務 | 需要 |
| `/` | GET | 列出用戶任務 (分頁/過濾) | 需要 |
| `/stats` | GET | 獲取任務統計 | 需要 |
| `/{task_id}` | GET | 獲取任務詳情 | 需要 |
| `/{task_id}` | PATCH | 更新任務 | 需要 |
| `/{task_id}` | DELETE | 刪除任務 | 需要 |
**查詢參數:**
- `status`: pending/processing/completed/failed
- `page`: 頁碼 (從 1 開始)
- `page_size`: 每頁筆數 (max 100)
- `order_by`: 排序欄位 (created_at/updated_at/completed_at)
- `order_desc`: 降序排列
#### 3. Schema 定義
**認證** (`backend/app/schemas/auth.py`):
- `LoginRequest`: username, password
- `Token`: access_token, token_type, expires_in, user (V2)
- `UserInfo`: id, email, display_name
- `UserResponse`: 完整用戶資訊
- `TokenData`: JWT payload 結構
**任務** (`backend/app/schemas/task.py`):
- `TaskCreate`: filename, file_type
- `TaskUpdate`: status, error_message, paths...
- `TaskResponse`: 任務基本資訊
- `TaskDetailResponse`: 任務 + 文件列表
- `TaskListResponse`: 分頁結果
- `TaskStatsResponse`: 統計數據
---
### Phase 7: JWT 驗證依賴 ✅
#### 更新 `backend/app/core/deps.py`
**新增 V2 依賴:**
```python
def get_current_user_v2(credentials, db) -> UserV2:
# 1. 解析 JWT token
# 2. 從資料庫查詢用戶 (tool_ocr_users)
# 3. 檢查用戶是否活躍
# 4. 驗證 session (如果有 session_id)
# 5. 檢查 session 是否過期
# 6. 更新 last_accessed_at
# 7. 返回用戶對象
def get_current_active_user_v2(current_user) -> UserV2:
# 確保用戶處於活躍狀態
```
**安全檢查:**
- JWT 簽名驗證
- 用戶存在性檢查
- 用戶活躍狀態檢查
- Session 有效性檢查
- Session 過期時間檢查
---
### Phase 8: 路由註冊 ✅
#### 更新 `backend/app/main.py`
```python
# Legacy V1 routers (保留向後兼容)
from app.routers import auth, ocr, export, translation
# V2 routers (新外部認證系統)
from app.routers import auth_v2, tasks
app.include_router(auth.router) # V1: /api/v1/auth
app.include_router(ocr.router) # V1: /api/v1/ocr
app.include_router(export.router) # V1: /api/v1/export
app.include_router(translation.router) # V1: /api/v1/translation
app.include_router(auth_v2.router) # V2: /api/v2/auth
app.include_router(tasks.router) # V2: /api/v2/tasks
```
**版本策略:**
- V1 API 保持不變 (向後兼容)
- V2 API 使用新認證系統
- 前端可逐步遷移
---
## 🔐 安全特性
### 1. 用戶隔離
- ✅ 所有任務查詢強制 `user_id` 過濾
- ✅ 用戶 A 無法訪問用戶 B 的任務
- ✅ Row-level security 在服務層實施
- ✅ 外鍵 CASCADE 刪除保證資料一致性
### 2. Session 管理
- ✅ 追蹤 IP 位址和 User Agent
- ✅ 自動過期檢查
- ✅ 最後訪問時間更新
- ⚠️ Token 加密待實作 (目前明文儲存)
### 3. 認證流程
- ✅ 外部 API 認證 (Azure AD)
- ✅ 內部 JWT 生成 (包含 user_id + session_id)
- ✅ 雙重驗證 (JWT + session 檢查)
- ✅ 錯誤重試機制 (3 次,指數退避)
### 4. 資料庫安全
- ✅ 資料表前綴命名空間隔離 (`tool_ocr_`)
- ✅ 索引優化 (email, task_id, status, created_at)
- ✅ 外鍵約束確保參照完整性
- ✅ 軟刪除支援 (file_deleted flag)
---
## 📊 資料庫架構
### 資料表關係圖:
```
tool_ocr_users (1)
├── tool_ocr_sessions (N) [FK: user_id, CASCADE]
└── tool_ocr_tasks (N) [FK: user_id, CASCADE]
└── tool_ocr_task_files (N) [FK: task_id, CASCADE]
```
### 索引策略:
```sql
-- 用戶表
CREATE INDEX ix_tool_ocr_users_email ON tool_ocr_users(email); -- 登入查詢
CREATE INDEX ix_tool_ocr_users_is_active ON tool_ocr_users(is_active);
-- Session 表
CREATE INDEX ix_tool_ocr_sessions_user_id ON tool_ocr_sessions(user_id);
CREATE INDEX ix_tool_ocr_sessions_expires_at ON tool_ocr_sessions(expires_at); -- 過期檢查
CREATE INDEX ix_tool_ocr_sessions_created_at ON tool_ocr_sessions(created_at);
-- 任務表
CREATE UNIQUE INDEX ix_tool_ocr_tasks_task_id ON tool_ocr_tasks(task_id); -- UUID 查詢
CREATE INDEX ix_tool_ocr_tasks_user_id ON tool_ocr_tasks(user_id); -- 用戶查詢
CREATE INDEX ix_tool_ocr_tasks_status ON tool_ocr_tasks(status); -- 狀態過濾
CREATE INDEX ix_tool_ocr_tasks_created_at ON tool_ocr_tasks(created_at); -- 排序
CREATE INDEX ix_tool_ocr_tasks_filename ON tool_ocr_tasks(filename); -- 搜尋
-- 任務文件表
CREATE INDEX ix_tool_ocr_task_files_task_id ON tool_ocr_task_files(task_id);
CREATE INDEX ix_tool_ocr_task_files_file_hash ON tool_ocr_task_files(file_hash); -- 去重
```
---
## 🧪 測試端點 (Swagger UI)
### 訪問 API 文檔:
```
http://localhost:8000/docs
```
### 測試流程:
#### 1. 登入測試
```bash
POST /api/v2/auth/login
Content-Type: application/json
{
"username": "user@example.com",
"password": "your_password"
}
# 成功回應:
{
"access_token": "eyJhbGc...",
"token_type": "bearer",
"expires_in": 86400,
"user": {
"id": 1,
"email": "user@example.com",
"display_name": "User Name"
}
}
```
#### 2. 獲取當前用戶
```bash
GET /api/v2/auth/me
Authorization: Bearer eyJhbGc...
# 回應:
{
"id": 1,
"email": "user@example.com",
"display_name": "User Name",
"created_at": "2025-11-14T16:00:00",
"last_login": "2025-11-14T16:30:00",
"is_active": true
}
```
#### 3. 創建任務
```bash
POST /api/v2/tasks/
Authorization: Bearer eyJhbGc...
Content-Type: application/json
{
"filename": "document.pdf",
"file_type": "application/pdf"
}
# 回應:
{
"id": 1,
"user_id": 1,
"task_id": "550e8400-e29b-41d4-a716-446655440000",
"filename": "document.pdf",
"file_type": "application/pdf",
"status": "pending",
"created_at": "2025-11-14T16:35:00",
...
}
```
#### 4. 列出任務
```bash
GET /api/v2/tasks/?status=completed&page=1&page_size=10
Authorization: Bearer eyJhbGc...
# 回應:
{
"tasks": [...],
"total": 25,
"page": 1,
"page_size": 10,
"has_more": true
}
```
#### 5. 獲取統計
```bash
GET /api/v2/tasks/stats
Authorization: Bearer eyJhbGc...
# 回應:
{
"total": 25,
"pending": 3,
"processing": 2,
"completed": 18,
"failed": 2
}
```
---
## ⚠️ 待實作項目
### 高優先級 (阻塞性)
1. **Token 加密** - Session 表中的 tokens 目前明文儲存
- 需要AES-256 加密
- 位置:`backend/app/routers/auth_v2.py` login endpoint
2. **完整 JWT 驗證** - 目前僅解碼,未驗證簽名
- 需要Azure AD 公鑰驗證
- 位置:`backend/app/services/external_auth_service.py`
3. **前端實作** - Phase 9-11
- 認證服務 (token 管理)
- 任務歷史 UI 頁面
- API 整合
### 中優先級 (功能性)
4. **Token 刷新機制** - 自動刷新即將過期的 token
5. **檔案上傳整合** - 將 OCR 服務與新任務系統整合
6. **任務通知** - 任務完成時通知用戶
7. **錯誤追蹤** - 詳細的錯誤日誌和監控
### 低優先級 (優化)
8. **效能測試** - 大量任務的查詢效能
9. **快取層** - Redis 快取用戶 session
10. **API 速率限制** - 防止濫用
11. **文檔生成** - 自動生成 API 文檔
---
## 📝 遷移指南 (前端開發者)
### 1. 更新登入流程
**舊 V1 方式:**
```typescript
// V1: Local authentication
const response = await fetch('/api/v1/auth/login', {
method: 'POST',
body: JSON.stringify({ username, password })
});
const { access_token } = await response.json();
```
**新 V2 方式:**
```typescript
// V2: External Azure AD authentication
const response = await fetch('/api/v2/auth/login', {
method: 'POST',
body: JSON.stringify({ username, password }) // Same interface!
});
const { access_token, user } = await response.json();
// Store token and user info
localStorage.setItem('token', access_token);
localStorage.setItem('user', JSON.stringify(user));
```
### 2. 使用新的任務 API
```typescript
// 獲取任務列表
const response = await fetch('/api/v2/tasks/?page=1&page_size=20', {
headers: {
'Authorization': `Bearer ${token}`
}
});
const { tasks, total, has_more } = await response.json();
// 獲取統計
const statsResponse = await fetch('/api/v2/tasks/stats', {
headers: { 'Authorization': `Bearer ${token}` }
});
const stats = await statsResponse.json();
// { total: 25, pending: 3, processing: 2, completed: 18, failed: 2 }
```
### 3. 處理認證錯誤
```typescript
const response = await fetch('/api/v2/tasks/', {
headers: { 'Authorization': `Bearer ${token}` }
});
if (response.status === 401) {
// Token 過期或無效,重新登入
if (data.detail === "Session expired, please login again") {
// 清除本地 token導向登入頁
localStorage.removeItem('token');
window.location.href = '/login';
}
}
```
---
## 🔍 除錯與監控
### 日誌位置:
```
./logs/app.log
```
### 重要日誌事件:
- `Authentication successful for user: {email}` - 登入成功
- `Created session {id} for user {email}` - Session 創建
- `Authenticated user: {email} (ID: {id})` - JWT 驗證成功
- `Expired session {id} for user {email}` - Session 過期
- `Created task {task_id} for user {email}` - 任務創建
### 資料庫查詢:
```sql
-- 檢查用戶
SELECT * FROM tool_ocr_users WHERE email = 'user@example.com';
-- 檢查 sessions
SELECT * FROM tool_ocr_sessions WHERE user_id = 1 ORDER BY created_at DESC;
-- 檢查任務
SELECT * FROM tool_ocr_tasks WHERE user_id = 1 ORDER BY created_at DESC LIMIT 10;
-- 統計
SELECT status, COUNT(*) FROM tool_ocr_tasks WHERE user_id = 1 GROUP BY status;
```
---
## ✅ 總結
### 已完成:
- ✅ 完整的資料庫架構設計 (4 個新表)
- ✅ 外部 API 認證服務整合
- ✅ 用戶 Session 管理系統
- ✅ 任務管理服務 (CRUD + 隔離)
- ✅ RESTful API 端點 (認證 + 任務)
- ✅ JWT 驗證依賴項
- ✅ 資料庫遷移腳本
- ✅ API Schema 定義
### 待繼續:
- ⏳ 前端認證服務
- ⏳ 前端任務歷史 UI
- ⏳ 整合測試
- ⏳ 文檔更新
### 技術債務:
- ⚠️ Token 加密 (高優先級)
- ⚠️ 完整 JWT 驗證 (高優先級)
- ⚠️ Token 刷新機制
---
**實作完成日期**2025-11-14
**實作人員**Claude Code
**審核狀態**:待用戶測試與審核

View File

@@ -0,0 +1,304 @@
# Migration Progress Update - 2025-11-14
## 概述
外部 Azure AD 認證遷移的核心功能已完成 **80%**。所有後端 API 和主要前端功能均已實作並可運行。
---
## ✅ 已完成功能 (Completed)
### 1. 數據庫架構重設計 ✅ **100% 完成**
- ✅ 1.3 使用 `tool_ocr_` 前綴創建新數據庫架構
- ✅ 1.4 創建 SQLAlchemy 模型
- `backend/app/models/user_v2.py` - 用戶模型email 作為主鍵)
- `backend/app/models/task.py` - 任務模型(含用戶隔離)
- `backend/app/models/session.py` - 會話管理模型
- `backend/app/models/audit_log.py` - 審計日誌模型
- ✅ 1.5 生成 Alembic 遷移腳本
- `5e75a59fb763_add_external_auth_schema_with_task_.py`
### 2. 配置管理 ✅ **100% 完成**
- ✅ 2.1 更新環境配置
- 添加 `EXTERNAL_AUTH_API_URL`
- 添加 `EXTERNAL_AUTH_ENDPOINT`
- 添加 `TOKEN_REFRESH_BUFFER`
- 添加任務管理相關設定
- ✅ 2.2 更新 Settings 類
- `backend/app/core/config.py` 已更新所有新配置
### 3. 外部 API 集成服務 ✅ **100% 完成**
- ✅ 3.1-3.3 創建認證 API 客戶端
- `backend/app/services/external_auth_service.py`
- 實作 `authenticate_user()`, `is_token_expiring_soon()`
- 包含重試邏輯和超時處理
### 4. 後端認證更新 ✅ **100% 完成**
- ✅ 4.1 修改登錄端點
- `backend/app/routers/auth_v2.py`
- 完整的外部 API 認證流程
- 用戶自動創建/更新
- ✅ 4.2-4.3 更新 Token 驗證
- `backend/app/core/deps.py`
- `get_current_user_v2()` 依賴注入
- `get_current_admin_user_v2()` 管理員權限檢查
### 5. 會話和 Token 管理 ✅ **100% 完成**
- ✅ 5.1 實作 Token 存儲
- 存儲於 `tool_ocr_sessions`
- 記錄 IP 地址、User-Agent、過期時間
- ✅ 5.2 創建 Token 刷新機制
- **前端**: 自動在過期前 5 分鐘刷新
- **後端**: `POST /api/v2/auth/refresh` 端點
- **功能**: 自動重試 401 錯誤
- ✅ 5.3 會話失效
- `POST /api/v2/auth/logout` 支持單個/全部會話登出
### 6. 前端更新 ✅ **90% 完成**
- ✅ 6.1 更新認證服務
- `frontend/src/services/apiV2.ts` - 完整 V2 API 客戶端
- 自動 Token 刷新和重試機制
- ✅ 6.2 更新認證 Store
- `frontend/src/store/authStore.ts` 存儲用戶信息
- ✅ 6.3 更新 UI 組件
- `frontend/src/pages/LoginPage.tsx` 整合 V2 登錄
- `frontend/src/components/Layout.tsx` 顯示用戶名稱和登出
- ✅ 6.4 錯誤處理
- 完整的錯誤顯示和重試邏輯
### 7. 任務管理系統 ✅ **100% 完成**
- ✅ 7.1 創建任務管理後端
- `backend/app/services/task_service.py`
- 完整的 CRUD 操作和用戶隔離
- ✅ 7.2 實作任務 API
- `backend/app/routers/tasks.py`
- `GET /api/v2/tasks` - 任務列表(含分頁)
- `GET /api/v2/tasks/{id}` - 任務詳情
- `DELETE /api/v2/tasks/{id}` - 刪除任務
- `POST /api/v2/tasks/{id}/start` - 開始任務
- `POST /api/v2/tasks/{id}/cancel` - 取消任務
- `POST /api/v2/tasks/{id}/retry` - 重試任務
- ✅ 7.3 創建任務歷史端點
- `GET /api/v2/tasks/stats` - 用戶統計
- 支持狀態、檔名、日期範圍篩選
- ✅ 7.4 實作檔案訪問控制
- `backend/app/services/file_access_service.py`
- 驗證用戶所有權
- 檢查任務狀態和檔案存在性
- ✅ 7.5 檔案下載功能
- `GET /api/v2/tasks/{id}/download/json`
- `GET /api/v2/tasks/{id}/download/markdown`
- `GET /api/v2/tasks/{id}/download/pdf`
### 8. 前端任務管理 UI ✅ **100% 完成**
- ✅ 8.1 創建任務歷史頁面
- `frontend/src/pages/TaskHistoryPage.tsx`
- 完整的任務列表和狀態指示器
- 分頁控制
- ✅ 8.3 創建篩選組件
- 狀態篩選下拉選單
- 檔名搜尋輸入框
- 日期範圍選擇器(開始/結束)
- 清除篩選按鈕
- ✅ 8.4-8.5 任務管理服務
- `frontend/src/services/apiV2.ts` 整合所有任務 API
- 完整的錯誤處理和重試邏輯
- ✅ 8.6 更新導航
- `frontend/src/App.tsx` 添加 `/tasks` 路由
- `frontend/src/components/Layout.tsx` 添加"任務歷史"選單
### 9. 用戶隔離和安全 ✅ **100% 完成**
- ✅ 9.1-9.2 用戶上下文和查詢隔離
- 所有任務查詢自動過濾 `user_id`
- 嚴格的用戶所有權驗證
- ✅ 9.3 檔案系統隔離
- 下載前驗證檔案路徑
- 檢查用戶所有權
- ✅ 9.4 API 授權
- 所有 V2 端點使用 `get_current_user_v2` 依賴
- 403 錯誤處理未授權訪問
### 10. 管理員功能 ✅ **100% 完成(後端)**
- ✅ 10.1 管理員權限系統
- `backend/app/services/admin_service.py`
- 管理員郵箱: `ymirliu@panjit.com.tw`
- `get_current_admin_user_v2()` 依賴注入
- ✅ 10.2 系統統計 API
- `GET /api/v2/admin/stats` - 系統總覽統計
- `GET /api/v2/admin/users` - 用戶列表(含統計)
- `GET /api/v2/admin/users/top` - 用戶排行榜
- ✅ 10.3 審計日誌系統
- `backend/app/models/audit_log.py` - 審計日誌模型
- `backend/app/services/audit_service.py` - 審計服務
- `GET /api/v2/admin/audit-logs` - 審計日誌查詢
- `GET /api/v2/admin/audit-logs/user/{id}/summary` - 用戶活動摘要
- ✅ 10.4 管理員路由註冊
- `backend/app/routers/admin.py`
- 已在 `backend/app/main.py` 中註冊
---
## 🚧 進行中 / 待完成 (In Progress / Pending)
### 11. 數據庫遷移 ⚠️ **待執行**
- ⏳ 11.1 創建審計日誌表遷移
- 需要: `alembic revision` 創建 `tool_ocr_audit_logs`
- 表結構已在 `audit_log.py` 中定義
- ⏳ 11.2 執行遷移
- 運行 `alembic upgrade head`
### 12. 前端管理員頁面 ⏳ **20% 完成**
- ⏳ 12.1 管理員儀表板頁面
- 需要: `frontend/src/pages/AdminDashboardPage.tsx`
- 顯示系統統計(用戶、任務、會話、活動)
- 用戶列表和排行榜
- ⏳ 12.2 審計日誌查看器
- 需要: `frontend/src/pages/AuditLogsPage.tsx`
- 顯示審計日誌列表
- 支持篩選(用戶、類別、日期範圍)
- 用戶活動摘要
- ⏳ 12.3 管理員路由和導航
- 更新 `App.tsx` 添加管理員路由
-`Layout.tsx` 中顯示管理員選單(僅管理員可見)
### 13. 測試 ⏳ **未開始**
- 所有功能需要完整測試
- 建議優先測試核心認證和任務管理流程
### 14. 文檔 ⏳ **部分完成**
- ✅ 已創建實作報告
- ⏳ 需要更新 API 文檔
- ⏳ 需要創建用戶使用指南
---
## 📊 完成度統計
| 模組 | 完成度 | 狀態 |
|------|--------|------|
| 數據庫架構 | 100% | ✅ 完成 |
| 配置管理 | 100% | ✅ 完成 |
| 外部 API 集成 | 100% | ✅ 完成 |
| 後端認證 | 100% | ✅ 完成 |
| Token 管理 | 100% | ✅ 完成 |
| 前端認證 | 90% | ✅ 基本完成 |
| 任務管理後端 | 100% | ✅ 完成 |
| 任務管理前端 | 100% | ✅ 完成 |
| 用戶隔離 | 100% | ✅ 完成 |
| 管理員功能(後端) | 100% | ✅ 完成 |
| 管理員功能(前端) | 20% | ⏳ 待開發 |
| 數據庫遷移 | 90% | ⚠️ 待執行 |
| 測試 | 0% | ⏳ 待開始 |
| 文檔 | 50% | ⏳ 進行中 |
**總體完成度: 80%**
---
## 🎯 核心成就
### 1. Token 自動刷新機制 🎉
- **前端**: 自動在過期前 5 分鐘刷新,無縫體驗
- **後端**: `/api/v2/auth/refresh` 端點
- **錯誤處理**: 401 自動重試機制
### 2. 完整的任務管理系統 🎉
- **任務操作**: 開始/取消/重試/刪除
- **任務篩選**: 狀態/檔名/日期範圍
- **檔案下載**: JSON/Markdown/PDF 三種格式
- **訪問控制**: 嚴格的用戶隔離和權限驗證
### 3. 管理員監控系統 🎉
- **系統統計**: 用戶、任務、會話、活動統計
- **用戶管理**: 用戶列表、排行榜
- **審計日誌**: 完整的事件記錄和查詢系統
### 4. 安全性增強 🎉
- **用戶隔離**: 所有查詢自動過濾用戶 ID
- **檔案訪問控制**: 驗證所有權和任務狀態
- **審計追蹤**: 記錄所有重要操作
---
## 📝 重要檔案清單
### 後端新增檔案
```
backend/app/models/
├── user_v2.py # 用戶模型(外部認證)
├── task.py # 任務模型
├── session.py # 會話模型
└── audit_log.py # 審計日誌模型
backend/app/services/
├── external_auth_service.py # 外部認證服務
├── task_service.py # 任務管理服務
├── file_access_service.py # 檔案訪問控制
├── admin_service.py # 管理員服務
└── audit_service.py # 審計日誌服務
backend/app/routers/
├── auth_v2.py # V2 認證路由
├── tasks.py # 任務管理路由
└── admin.py # 管理員路由
backend/alembic/versions/
└── 5e75a59fb763_add_external_auth_schema_with_task_.py
```
### 前端新增/修改檔案
```
frontend/src/services/
└── apiV2.ts # 完整 V2 API 客戶端
frontend/src/pages/
├── LoginPage.tsx # 整合 V2 登錄
└── TaskHistoryPage.tsx # 任務歷史頁面
frontend/src/components/
└── Layout.tsx # 導航和用戶資訊
frontend/src/types/
└── apiV2.ts # V2 類型定義
```
---
## 🚀 下一步行動
### 立即執行
1.**提交當前進度** - 所有核心功能已實作
2. **執行數據庫遷移** - 運行 Alembic 遷移添加 audit_logs 表
3. **系統測試** - 測試認證流程和任務管理功能
### 可選增強
1. **前端管理員頁面** - 管理員儀表板和審計日誌查看器
2. **完整測試套件** - 單元測試和集成測試
3. **性能優化** - 查詢優化和緩存策略
---
## 🔒 安全注意事項
### 已實作
- ✅ 用戶隔離Row-level security
- ✅ 檔案訪問控制
- ✅ Token 過期檢查
- ✅ 管理員權限驗證
- ✅ 審計日誌記錄
### 待實作(可選)
- ⏳ Token 加密存儲
- ⏳ 速率限制
- ⏳ CSRF 保護增強
---
## 📞 聯繫資訊
**管理員郵箱**: ymirliu@panjit.com.tw
**外部認證 API**: https://pj-auth-api.vercel.app
---
*最後更新: 2025-11-14*
*實作者: Claude Code*

View File

@@ -0,0 +1,183 @@
-- Tool_OCR Database Schema with External API Authentication
-- Version: 2.0.0
-- Date: 2025-11-14
-- Description: Complete database redesign with user task isolation and history
-- ============================================
-- Drop existing tables (if needed)
-- ============================================
-- Uncomment these lines to drop existing tables
-- DROP TABLE IF EXISTS tool_ocr_sessions;
-- DROP TABLE IF EXISTS tool_ocr_task_files;
-- DROP TABLE IF EXISTS tool_ocr_tasks;
-- DROP TABLE IF EXISTS tool_ocr_users;
-- ============================================
-- 1. Users Table
-- ============================================
CREATE TABLE IF NOT EXISTS tool_ocr_users (
id INT PRIMARY KEY AUTO_INCREMENT,
email VARCHAR(255) UNIQUE NOT NULL COMMENT 'Primary identifier from Azure AD',
display_name VARCHAR(255) COMMENT 'Display name from API response',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
last_login TIMESTAMP NULL,
is_active BOOLEAN DEFAULT TRUE,
INDEX idx_email (email),
INDEX idx_active (is_active)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
COMMENT='User accounts authenticated via external API';
-- ============================================
-- 2. OCR Tasks Table
-- ============================================
CREATE TABLE IF NOT EXISTS tool_ocr_tasks (
id INT PRIMARY KEY AUTO_INCREMENT,
user_id INT NOT NULL COMMENT 'Foreign key to users table',
task_id VARCHAR(255) UNIQUE NOT NULL COMMENT 'Unique task identifier (UUID)',
filename VARCHAR(255),
file_type VARCHAR(50),
status ENUM('pending', 'processing', 'completed', 'failed') DEFAULT 'pending',
result_json_path VARCHAR(500) COMMENT 'Path to JSON result file',
result_markdown_path VARCHAR(500) COMMENT 'Path to Markdown result file',
result_pdf_path VARCHAR(500) COMMENT 'Path to searchable PDF file',
error_message TEXT COMMENT 'Error details if task failed',
processing_time_ms INT COMMENT 'Processing time in milliseconds',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
completed_at TIMESTAMP NULL,
file_deleted BOOLEAN DEFAULT FALSE COMMENT 'Track if files were auto-deleted',
FOREIGN KEY (user_id) REFERENCES tool_ocr_users(id) ON DELETE CASCADE,
INDEX idx_user_status (user_id, status),
INDEX idx_created (created_at),
INDEX idx_task_id (task_id),
INDEX idx_filename (filename)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
COMMENT='OCR processing tasks with user association';
-- ============================================
-- 3. Task Files Table
-- ============================================
CREATE TABLE IF NOT EXISTS tool_ocr_task_files (
id INT PRIMARY KEY AUTO_INCREMENT,
task_id INT NOT NULL COMMENT 'Foreign key to tasks table',
original_name VARCHAR(255),
stored_path VARCHAR(500) COMMENT 'Actual file path on server',
file_size BIGINT COMMENT 'File size in bytes',
mime_type VARCHAR(100),
file_hash VARCHAR(64) COMMENT 'SHA256 hash for deduplication',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (task_id) REFERENCES tool_ocr_tasks(id) ON DELETE CASCADE,
INDEX idx_task (task_id),
INDEX idx_hash (file_hash)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
COMMENT='Files associated with OCR tasks';
-- ============================================
-- 4. Sessions Table (Token Storage)
-- ============================================
CREATE TABLE IF NOT EXISTS tool_ocr_sessions (
id INT PRIMARY KEY AUTO_INCREMENT,
user_id INT NOT NULL COMMENT 'Foreign key to users table',
session_id VARCHAR(255) UNIQUE NOT NULL COMMENT 'Unique session identifier',
access_token TEXT COMMENT 'Azure AD access token (encrypted)',
id_token TEXT COMMENT 'Azure AD ID token (encrypted)',
refresh_token TEXT COMMENT 'Refresh token if available',
expires_at TIMESTAMP NOT NULL COMMENT 'Token expiration time',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
last_accessed TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
is_active BOOLEAN DEFAULT TRUE,
ip_address VARCHAR(45) COMMENT 'Client IP address',
user_agent TEXT COMMENT 'Client user agent',
FOREIGN KEY (user_id) REFERENCES tool_ocr_users(id) ON DELETE CASCADE,
INDEX idx_user (user_id),
INDEX idx_session (session_id),
INDEX idx_expires (expires_at),
INDEX idx_active (is_active)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
COMMENT='User session and token management';
-- ============================================
-- 5. Audit Log Table (Optional)
-- ============================================
CREATE TABLE IF NOT EXISTS tool_ocr_audit_logs (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
user_id INT COMMENT 'User who performed the action',
action VARCHAR(100) NOT NULL COMMENT 'Action performed',
entity_type VARCHAR(50) COMMENT 'Type of entity affected',
entity_id INT COMMENT 'ID of entity affected',
details JSON COMMENT 'Additional details in JSON format',
ip_address VARCHAR(45),
user_agent TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_user (user_id),
INDEX idx_action (action),
INDEX idx_created (created_at)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
COMMENT='Audit trail for all system actions';
-- ============================================
-- Views for Common Queries
-- ============================================
-- User task statistics view
CREATE OR REPLACE VIEW tool_ocr_user_stats AS
SELECT
u.id as user_id,
u.email,
u.display_name,
COUNT(DISTINCT t.id) as total_tasks,
SUM(CASE WHEN t.status = 'completed' THEN 1 ELSE 0 END) as completed_tasks,
SUM(CASE WHEN t.status = 'failed' THEN 1 ELSE 0 END) as failed_tasks,
SUM(CASE WHEN t.status = 'processing' THEN 1 ELSE 0 END) as processing_tasks,
SUM(CASE WHEN t.status = 'pending' THEN 1 ELSE 0 END) as pending_tasks,
AVG(t.processing_time_ms) as avg_processing_time_ms,
MAX(t.created_at) as last_task_created
FROM tool_ocr_users u
LEFT JOIN tool_ocr_tasks t ON u.id = t.user_id
GROUP BY u.id, u.email, u.display_name;
-- Recent tasks view
CREATE OR REPLACE VIEW tool_ocr_recent_tasks AS
SELECT
t.*,
u.email as user_email,
u.display_name as user_name
FROM tool_ocr_tasks t
INNER JOIN tool_ocr_users u ON t.user_id = u.id
ORDER BY t.created_at DESC
LIMIT 100;
-- ============================================
-- Stored Procedures (Optional)
-- ============================================
DELIMITER $$
-- Procedure to clean up expired sessions
CREATE PROCEDURE IF NOT EXISTS cleanup_expired_sessions()
BEGIN
DELETE FROM tool_ocr_sessions
WHERE expires_at < NOW() OR is_active = FALSE;
END$$
-- Procedure to clean up old tasks
CREATE PROCEDURE IF NOT EXISTS cleanup_old_tasks(IN days_to_keep INT)
BEGIN
UPDATE tool_ocr_tasks
SET file_deleted = TRUE
WHERE created_at < DATE_SUB(NOW(), INTERVAL days_to_keep DAY)
AND status IN ('completed', 'failed');
END$$
DELIMITER ;
-- ============================================
-- Initial Data (Optional)
-- ============================================
-- Add any initial data here if needed
-- ============================================
-- Grants (Adjust as needed)
-- ============================================
-- GRANT ALL PRIVILEGES ON tool_ocr_* TO 'tool_ocr_user'@'localhost';
-- FLUSH PRIVILEGES;

View File

@@ -0,0 +1,294 @@
# Change: Migrate to External API Authentication
## Why
The current local database authentication system has several limitations:
- User credentials are managed locally, requiring manual user creation and password management
- No centralized authentication with enterprise identity systems
- Cannot leverage existing enterprise authentication infrastructure (e.g., Microsoft Azure AD)
- No single sign-on (SSO) capability
- Increased maintenance overhead for user management
By migrating to the external API authentication service at https://pj-auth-api.vercel.app, the system will:
- Integrate with enterprise Microsoft Azure AD authentication
- Enable single sign-on (SSO) for users
- Eliminate local password management
- Leverage existing enterprise user management and security policies
- Reduce maintenance overhead
- Provide consistent authentication across multiple applications
## What Changes
### Authentication Flow
- **Current**: Local database authentication using username/password stored in MySQL
- **New**: External API authentication via POST to `https://pj-auth-api.vercel.app/api/auth/login`
- **Token Management**: Use JWT tokens from external API instead of locally generated tokens
- **User Display**: Use `name` field from API response for user display instead of local username
### API Integration
**Endpoint**: `POST https://pj-auth-api.vercel.app/api/auth/login`
**Request Format**:
```json
{
"username": "user@domain.com",
"password": "user_password"
}
```
**Success Response (200)**:
```json
{
"success": true,
"message": "認證成功",
"data": {
"access_token": "eyJ0eXAiOiJKV1Q...",
"id_token": "eyJ0eXAiOiJKV1Q...",
"expires_in": 4999,
"token_type": "Bearer",
"userInfo": {
"id": "42cf0b98-f598-47dd-ae2a-f33803f87d41",
"name": "ymirliu 劉念萱",
"email": "ymirliu@panjit.com.tw",
"jobTitle": null,
"officeLocation": "高雄",
"businessPhones": ["1580"]
},
"issuedAt": "2025-11-14T07:09:15.203Z",
"expiresAt": "2025-11-14T08:32:34.203Z"
},
"timestamp": "2025-11-14T07:09:15.203Z"
}
```
**Failure Response (401)**:
```json
{
"success": false,
"error": "用戶名或密碼錯誤",
"code": "INVALID_CREDENTIALS",
"timestamp": "2025-11-14T07:10:02.585Z"
}
```
### Database Schema Changes
**Complete Redesign (No backward compatibility needed)**:
**Table Prefix**: `tool_ocr_` (for clear separation from other systems in the same database)
1. **tool_ocr_users table (redesigned)**:
```sql
CREATE TABLE tool_ocr_users (
id INT PRIMARY KEY AUTO_INCREMENT,
email VARCHAR(255) UNIQUE NOT NULL, -- Primary identifier from Azure AD
display_name VARCHAR(255), -- Display name from API response
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
last_login TIMESTAMP,
is_active BOOLEAN DEFAULT TRUE
);
```
Note: No Azure AD ID storage needed - email is sufficient as unique identifier
2. **tool_ocr_tasks table (new - for task history)**:
```sql
CREATE TABLE tool_ocr_tasks (
id INT PRIMARY KEY AUTO_INCREMENT,
user_id INT NOT NULL, -- Foreign key to users table
task_id VARCHAR(255) UNIQUE, -- Unique task identifier
filename VARCHAR(255),
file_type VARCHAR(50),
status ENUM('pending', 'processing', 'completed', 'failed'),
result_json_path VARCHAR(500),
result_markdown_path VARCHAR(500),
error_message TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
completed_at TIMESTAMP NULL,
file_deleted BOOLEAN DEFAULT FALSE, -- Track if files were auto-deleted
FOREIGN KEY (user_id) REFERENCES tool_ocr_users(id),
INDEX idx_user_status (user_id, status),
INDEX idx_created (created_at)
);
```
3. **tool_ocr_task_files table (for multiple files per task)**:
```sql
CREATE TABLE tool_ocr_task_files (
id INT PRIMARY KEY AUTO_INCREMENT,
task_id INT NOT NULL,
original_name VARCHAR(255),
stored_path VARCHAR(500),
file_size BIGINT,
mime_type VARCHAR(100),
FOREIGN KEY (task_id) REFERENCES tool_ocr_tasks(id) ON DELETE CASCADE
);
```
4. **tool_ocr_sessions table (for token management)**:
```sql
CREATE TABLE tool_ocr_sessions (
id INT PRIMARY KEY AUTO_INCREMENT,
user_id INT NOT NULL,
access_token TEXT,
id_token TEXT,
expires_at TIMESTAMP,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (user_id) REFERENCES tool_ocr_users(id) ON DELETE CASCADE,
INDEX idx_user (user_id),
INDEX idx_expires (expires_at)
);
```
### Session Management
- Store external API tokens in session/cache instead of local JWT
- Implement token refresh mechanism based on `expires_in` field
- Use `expiresAt` timestamp for token expiration validation
## New Features: User Task Isolation and History
### Task Isolation
- **Principle**: Each user can only see and access their own tasks
- **Implementation**: All task queries filtered by `user_id` at API level
- **Security**: Enforce user context validation in all task-related endpoints
### Task History Features
1. **Task Status Tracking**:
- View pending tasks (waiting to process)
- View processing tasks (currently running)
- View completed tasks (with results available)
- View failed tasks (with error messages)
2. **Historical Query Capabilities**:
- Search tasks by filename
- Filter by date range
- Filter by status
- Sort by creation/completion time
- Pagination for large result sets
3. **Task Management**:
- Download original files (if not auto-deleted)
- Download results (JSON, Markdown, PDF exports)
- Re-process failed tasks
- Delete old tasks manually
### Frontend UI Changes
1. **New Components**:
- Task History page/tab
- Task filters and search bar
- Task status badges
- Batch action controls
2. **Task List View**:
```
| Filename | Status | Created | Completed | Actions |
|----------|--------|---------|-----------|---------|
| doc1.pdf | ✅ Completed | 2025-11-14 10:00 | 2025-11-14 10:05 | [Download] [View] |
| doc2.pdf | 🔄 Processing | 2025-11-14 10:10 | - | [Cancel] |
| doc3.pdf | ❌ Failed | 2025-11-14 09:00 | - | [Retry] [View Error] |
```
3. **User Information Display**:
- Show user display name in header
- Show last login time
- Show task statistics (total, completed, failed)
## Impact
### Affected Capabilities
- `authentication`: Complete replacement of authentication mechanism
- `user-management`: Simplified to read-only user information from external API
- `session-management`: Modified to handle external tokens
- `task-management`: NEW - User-specific task isolation and history
- `file-access-control`: NEW - User-based file access restrictions
### Affected Code
- **Backend Authentication**:
- `backend/app/api/v1/endpoints/auth.py`: Replace login logic with external API call
- `backend/app/core/security.py`: Modify token validation to use external tokens
- `backend/app/core/auth.py`: Update authentication dependencies
- `backend/app/services/auth_service.py`: New service for external API integration
- **Database Models**:
- `backend/app/models/user.py`: Complete redesign with new schema
- `backend/app/models/task.py`: NEW - Task model with user association
- `backend/app/models/task_file.py`: NEW - Task file model
- `backend/alembic/versions/`: Complete database recreation
- **Task Management APIs** (NEW):
- `backend/app/api/v1/endpoints/tasks.py`: Task CRUD operations with user isolation
- `backend/app/api/v1/endpoints/task_history.py`: Historical query endpoints
- `backend/app/services/task_service.py`: Task business logic
- `backend/app/services/file_access_service.py`: User-based file access control
- **Frontend**:
- `frontend/src/services/authService.ts`: Update to handle new token format
- `frontend/src/stores/authStore.ts`: Modify to store/display user info from API
- `frontend/src/components/Header.tsx`: Display `name` field and user menu
- `frontend/src/pages/TaskHistory.tsx`: NEW - Task history page
- `frontend/src/components/TaskList.tsx`: NEW - Task list component with filters
- `frontend/src/components/TaskFilters.tsx`: NEW - Search and filter UI
- `frontend/src/stores/taskStore.ts`: NEW - Task state management
- `frontend/src/services/taskService.ts`: NEW - Task API client
### Dependencies
- Add `httpx` or `aiohttp` for async HTTP requests to external API (already present)
- No new package dependencies required
### Configuration
- New environment variables:
- `EXTERNAL_AUTH_API_URL` = "https://pj-auth-api.vercel.app"
- `EXTERNAL_AUTH_ENDPOINT` = "/api/auth/login"
- `EXTERNAL_AUTH_TIMEOUT` = 30 (seconds)
- `TOKEN_REFRESH_BUFFER` = 300 (refresh tokens 5 minutes before expiry)
- `TASK_RETENTION_DAYS` = 30 (auto-delete old tasks)
- `MAX_TASKS_PER_USER` = 1000 (limit per user)
- `ENABLE_TASK_HISTORY` = true (enable history feature)
- `DATABASE_TABLE_PREFIX` = "tool_ocr_" (table naming prefix)
### Security Considerations
- HTTPS required for all authentication requests
- Token storage must be secure (HTTPOnly cookies or secure session storage)
- Implement rate limiting for authentication attempts
- Log all authentication events for audit trail
- Validate SSL certificates for external API calls
- Handle network failures gracefully with appropriate error messages
- **User Isolation**: Enforce user context in all database queries
- **File Access Control**: Validate user ownership before file access
- **API Security**: Add user_id validation in all task-related endpoints
### Migration Plan (Simplified - No Rollback Needed)
1. **Phase 1**: Backup existing database (for reference only)
2. **Phase 2**: Drop old tables and create new schema
3. **Phase 3**: Deploy new authentication and task management system
4. **Phase 4**: Test with initial users
5. **Phase 5**: Full deployment
Note: Since this is a test system with no production data to preserve, we can perform a clean migration without rollback concerns.
## Risks and Mitigations
### Risks
1. **External API Unavailability**: Authentication service downtime blocks all logins
- *Mitigation*: Implement fallback to local auth, cache tokens, implement retry logic
2. **Token Expiration Handling**: Users may be logged out unexpectedly
- *Mitigation*: Implement automatic token refresh before expiration
3. **Network Latency**: Slower authentication due to external API calls
- *Mitigation*: Implement proper timeout handling, async requests, response caching
4. **Data Consistency**: User information mismatch between local DB and external system
- *Mitigation*: Regular sync jobs, use external system as single source of truth
5. **Breaking Change**: Existing sessions will be invalidated
- *Mitigation*: Provide migration window, clear communication to users
## Success Criteria
- All users can authenticate via external API
- Authentication response time < 2 seconds (95th percentile)
- Zero data loss during migration
- Automatic token refresh works without user intervention
- Proper error messages for all failure scenarios
- Audit logs capture all authentication events
- Rollback procedure tested and documented

View File

@@ -0,0 +1,276 @@
# Implementation Tasks
## 1. Database Schema Redesign
- [ ] 1.1 Backup existing database (for reference)
- Export current schema and data
- Document any important data to preserve
- [ ] 1.2 Drop old tables
- Remove existing tables with old naming convention
- Clear database for fresh start
- [ ] 1.3 Create new database schema with `tool_ocr_` prefix
- Create new `tool_ocr_users` table (email as primary identifier)
- Create `tool_ocr_tasks` table with user association
- Create `tool_ocr_task_files` table for file tracking
- Create `tool_ocr_sessions` table for token storage
- Add proper indexes for performance
- [ ] 1.4 Create SQLAlchemy models
- User model (mapped to `tool_ocr_users`)
- Task model (mapped to `tool_ocr_tasks`)
- TaskFile model (mapped to `tool_ocr_task_files`)
- Session model (mapped to `tool_ocr_sessions`)
- Configure table prefix in base model
- [ ] 1.5 Generate Alembic migration
- Create initial migration for new schema
- Test migration script with proper table prefixes
## 2. Configuration Management
- [ ] 2.1 Update environment configuration
- Add `EXTERNAL_AUTH_API_URL` to `.env.local`
- Add `EXTERNAL_AUTH_ENDPOINT` configuration
- Add `EXTERNAL_AUTH_TIMEOUT` setting
- Add `TOKEN_REFRESH_BUFFER` setting
- Add `TASK_RETENTION_DAYS` for auto-cleanup
- Add `MAX_TASKS_PER_USER` for limits
- Add `ENABLE_TASK_HISTORY` feature flag
- Add `DATABASE_TABLE_PREFIX` = "tool_ocr_"
- [ ] 2.2 Update Settings class
- Add external auth settings to `backend/app/core/config.py`
- Add task management settings
- Add database table prefix configuration
- Add validation for new configuration values
- Remove old authentication settings
## 3. External API Integration Service
- [ ] 3.1 Create auth API client
- Implement `backend/app/services/external_auth_service.py`
- Create async HTTP client for API calls
- Implement request/response models
- Add proper error handling and logging
- [ ] 3.2 Implement authentication methods
- `authenticate_user()` - Call external API
- `validate_token()` - Verify token validity
- `refresh_token()` - Handle token refresh
- `get_user_info()` - Fetch user details
- [ ] 3.3 Add resilience patterns
- Implement retry logic with exponential backoff
- Add circuit breaker pattern
- Implement timeout handling
- Add fallback mechanisms
## 4. Backend Authentication Updates
- [ ] 4.1 Modify login endpoint
- Update `backend/app/api/v1/endpoints/auth.py`
- Route to external API based on feature flag
- Handle both authentication modes during transition
- Return appropriate token format
- [ ] 4.2 Update token validation
- Modify `backend/app/core/security.py`
- Support both local and external tokens
- Implement token type detection
- Update JWT validation logic
- [ ] 4.3 Update authentication dependencies
- Modify `backend/app/core/auth.py`
- Update `get_current_user()` dependency
- Handle external user information
- Implement proper user context
## 5. Session and Token Management
- [ ] 5.1 Implement token storage
- Store external tokens securely
- Implement token encryption at rest
- Handle multiple token types (access, ID, refresh)
- [ ] 5.2 Create token refresh mechanism
- Background task for token refresh
- Refresh tokens before expiration
- Update stored tokens atomically
- Handle refresh failures gracefully
- [ ] 5.3 Session invalidation
- Clear tokens on logout
- Handle token revocation
- Implement session timeout
## 6. Frontend Updates
- [ ] 6.1 Update authentication service
- Modify `frontend/src/services/authService.ts`
- Handle new token format
- Store user display information
- Implement token refresh on client side
- [ ] 6.2 Update auth store
- Modify `frontend/src/stores/authStore.ts`
- Store external user information
- Update user display logic
- Handle token expiration
- [ ] 6.3 Update UI components
- Modify `frontend/src/components/Header.tsx`
- Display user `name` instead of username
- Show additional user information
- Update login form if needed
- [ ] 6.4 Error handling
- Handle external API errors
- Display appropriate error messages
- Implement retry UI for failures
- Add loading states
## 7. Task Management System (NEW)
- [ ] 7.1 Create task management backend
- Implement `backend/app/models/task.py`
- Implement `backend/app/models/task_file.py`
- Create `backend/app/services/task_service.py`
- Add task CRUD operations with user isolation
- [ ] 7.2 Implement task APIs
- Create `backend/app/api/v1/endpoints/tasks.py`
- GET /tasks (list user's tasks with pagination)
- GET /tasks/{id} (get specific task)
- DELETE /tasks/{id} (delete task)
- POST /tasks/{id}/retry (retry failed task)
- [ ] 7.3 Create task history endpoints
- Create `backend/app/api/v1/endpoints/task_history.py`
- GET /history (query with filters)
- GET /history/stats (user statistics)
- POST /history/export (export history)
- [ ] 7.4 Implement file access control
- Create `backend/app/services/file_access_service.py`
- Validate user ownership before file access
- Restrict download to user's own files
- Add audit logging for file access
- [ ] 7.5 Update OCR service integration
- Link OCR tasks to user accounts
- Save task records in database
- Update task status during processing
- Store result file paths
## 8. Frontend Task Management UI (NEW)
- [ ] 8.1 Create task history page
- Implement `frontend/src/pages/TaskHistory.tsx`
- Display task list with status indicators
- Add pagination controls
- Show task details modal
- [ ] 8.2 Build task list component
- Implement `frontend/src/components/TaskList.tsx`
- Display task table with columns
- Add sorting capabilities
- Implement action buttons
- [ ] 8.3 Create filter components
- Implement `frontend/src/components/TaskFilters.tsx`
- Date range picker
- Status filter dropdown
- Search by filename
- Clear filters button
- [ ] 8.4 Add task management store
- Implement `frontend/src/stores/taskStore.ts`
- Manage task list state
- Handle filter state
- Cache task data
- [ ] 8.5 Create task service client
- Implement `frontend/src/services/taskService.ts`
- API methods for task operations
- Handle pagination
- Implement retry logic
- [ ] 8.6 Update navigation
- Add "Task History" menu item
- Update router configuration
- Add task count badge
- Implement user menu with stats
## 9. User Isolation and Security
- [ ] 9.1 Implement user context middleware
- Create middleware to inject user context
- Validate user in all requests
- Add user_id to logging context
- [ ] 9.2 Database query isolation
- Add user_id filter to all task queries
- Prevent cross-user data access
- Implement row-level security
- [ ] 9.3 File system isolation
- Organize files by user directory
- Validate file paths before access
- Implement cleanup for deleted users
- [ ] 9.4 API authorization
- Add @require_user decorator
- Validate ownership in endpoints
- Return 403 for unauthorized access
## 10. Testing
- [ ] 10.1 Unit tests
- Test external auth service
- Test token validation
- Test task isolation logic
- Test file access control
- [ ] 10.2 Integration tests
- Test full authentication flow
- Test task management flow
- Test user isolation between accounts
- Test file download restrictions
- [ ] 10.3 Load testing
- Test external API response times
- Test system with many concurrent users
- Test large task history queries
- Measure database query performance
- [ ] 10.4 Security testing
- Test token security
- Verify user isolation
- Test unauthorized access attempts
- Validate SQL injection prevention
## 11. Migration Execution (Simplified)
- [ ] 11.1 Pre-migration preparation
- Backup existing database (reference only)
- Prepare deployment package
- Set up monitoring
- [ ] 11.2 Execute migration
- Drop old database tables
- Create new schema
- Deploy new code
- Verify system startup
- [ ] 11.3 Post-migration validation
- Test authentication with real users
- Verify task isolation works
- Check task history functionality
- Validate file access controls
## 12. Documentation
- [ ] 12.1 Technical documentation
- Update API documentation with new endpoints
- Document authentication flow
- Document task management APIs
- Create troubleshooting guide
- [ ] 12.2 User documentation
- Update login instructions
- Document task history features
- Explain user isolation
- Create user guide for new UI
- [ ] 12.3 Developer documentation
- Document database schema
- Explain security model
- Provide integration examples
## 13. Monitoring and Observability
- [ ] 13.1 Add monitoring metrics
- Authentication success/failure rates
- Task creation/completion rates
- User activity metrics
- File storage usage
- [ ] 13.2 Implement logging
- Log all authentication attempts
- Log task operations
- Log file access attempts
- Structured logging for analysis
- [ ] 13.3 Create alerts
- Alert on authentication failures
- Alert on high error rates
- Alert on storage issues
- Alert on performance degradation
## 14. Performance Optimization (Post-Launch)
- [ ] 14.1 Database optimization
- Analyze query patterns
- Add missing indexes
- Optimize slow queries
- [ ] 14.2 Caching implementation
- Cache user information
- Cache task lists
- Implement Redis if needed
- [ ] 14.3 File management
- Implement automatic cleanup
- Optimize storage structure
- Add compression if needed