chore: project cleanup and prepare for dual-track processing refactor

- Removed all test files and directories - Deleted outdated documentation (will be rewritten) - Cleaned up temporary files, logs, and uploads - Archived 5 completed OpenSpec proposals - Created new dual-track-document-processing proposal with complete OpenSpec structure - Dual-track architecture: OCR track (PaddleOCR) + Direct track (PyMuPDF) - UnifiedDocument model for consistent output - Support for structure-preserving translation - Updated .gitignore to prevent future test/temp files This is a major cleanup preparing for the complete refactoring of the document processing pipeline. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-18 20:02:31 +08:00
parent 0edc56b03f
commit cd3cbea49d
64 changed files with 3573 additions and 8190 deletions
--- a/.claude/settings.local.json
+++ b/.claude/settings.local.json
@@ -1,7 +1,15 @@
 {
  "permissions": {
    "allow": [
-      "Bash(git commit:*)"
+      "Bash(git commit:*)",
+      "Bash(xargs ls:*)",
+      "Bash(jq:*)",
+      "Bash(python:*)",
+      "Bash(python3:*)",
+      "Bash(source venv/bin/activate)",
+      "Bash(find:*)",
+      "Bash(ls:*)",
+      "Bash(openspec list:*)"
    ],
    "deny": [],
    "ask": []
--- a/.gitignore
+++ b/.gitignore
@@ -89,3 +89,12 @@ build/
 Thumbs.db
 ehthumbs.db
 Desktop.ini
+
+# Test and temporary files
+backend/uploads/*
+storage/uploads/*
+storage/results/*
+*.log
+__pycache__/
+*.bak
+test_*.py
--- a/API_REFERENCE.md
+++ b/API_REFERENCE.md
@@ -1,743 +0,0 @@
-# Tool_OCR API Reference & Issues Report
-
-## 文件資訊
- **建立日期**: 2025-01-13
- **版本**: v0.1.0
- **目的**: 完整記錄所有 API 端點及前後端不一致問題
-
---
-
-## 目錄
-1. [API 端點清單](#api-端點清單)
-2. [前後端不一致問題](#前後端不一致問題)
-3. [修正建議](#修正建議)
-
---
-
-## API 端點清單
-
-### 1. 認證 API (Authentication)
-
-#### POST `/api/v1/auth/login`
- **功能**: 使用者登入
- **請求 Body**:
-  ```typescript
-  {
-    username: string,
-    password: string
-  }
-  ```
- **回應**:
-  ```typescript
-  {
-    access_token: string,
-    token_type: string,        // "bearer"
-    expires_in: number         // Token 過期時間(秒)
-  }
-  ```
- **後端實作**: ✅ [backend/app/routers/auth.py:24](backend/app/routers/auth.py#L24)
- **前端使用**: ✅ [frontend/src/services/api.ts:106](frontend/src/services/api.ts#L106)
- **狀態**: ⚠️ **有問題** - 前端型別缺少 `expires_in` 欄位
-
---
-
-### 2. 檔案上傳 API (File Upload)
-
-#### POST `/api/v1/upload`
- **功能**: 上傳檔案進行 OCR 處理
- **請求 Body**: `multipart/form-data`
-  - `files`: File[] - 檔案列表 (PNG, JPG, JPEG, PDF)
-  - `batch_name`: string (optional) - 批次名稱
- **回應**:
-  ```typescript
-  {
-    batch_id: number,
-    files: [
-      {
-        id: number,
-        batch_id: number,
-        filename: string,
-        original_filename: string,
-        file_size: number,
-        file_format: string,      // ⚠️ 後端用 file_format
-        status: string,
-        error: string | null,
-        created_at: string,
-        processing_time: number | null
-      }
-    ]
-  }
-  ```
- **後端實作**: ✅ [backend/app/routers/ocr.py:39](backend/app/routers/ocr.py#L39)
- **前端使用**: ✅ [frontend/src/services/api.ts:128](frontend/src/services/api.ts#L128)
- **狀態**: ⚠️ **有問題** - 前端型別用 `format`，後端用 `file_format`
-
---
-
-### 3. OCR 處理 API (OCR Processing)
-
-#### POST `/api/v1/ocr/process`
- **功能**: 觸發 OCR 批次處理
- **請求 Body**:
-  ```typescript
-  {
-    batch_id: number,
-    lang: string,             // "ch", "en", "japan", "korean"
-    detect_layout: boolean    // ⚠️ 後端用 detect_layout，前端用 confidence_threshold
-  }
-  ```
- **回應**:
-  ```typescript
-  {
-    message: string,          // ⚠️ 後端有此欄位
-    batch_id: number,
-    total_files: number,      // ⚠️ 後端有此欄位
-    status: string            // "processing"
-    // task_id: string        // ❌ 前端期待此欄位，但後端沒有
-  }
-  ```
- **後端實作**: ✅ [backend/app/routers/ocr.py:95](backend/app/routers/ocr.py#L95)
- **前端使用**: ✅ [frontend/src/services/api.ts:148](frontend/src/services/api.ts#L148)
- **狀態**: ⚠️ **有問題** - 請求/回應模型不匹配
-
---
-
-#### GET `/api/v1/batch/{batch_id}/status`
- **功能**: 取得批次處理狀態
- **路徑參數**:
-  - `batch_id`: number - 批次 ID
- **回應**:
-  ```typescript
-  {
-    batch: {
-      id: number,
-      user_id: number,
-      batch_name: string | null,
-      status: string,
-      total_files: number,
-      completed_files: number,
-      failed_files: number,
-      progress_percentage: number,
-      created_at: string,
-      started_at: string | null,
-      completed_at: string | null
-    },
-    files: [
-      {
-        id: number,
-        batch_id: number,
-        filename: string,
-        original_filename: string,
-        file_size: number,
-        file_format: string,
-        status: string,
-        error: string | null,
-        created_at: string,
-        processing_time: number | null
-      }
-    ]
-  }
-  ```
- **後端實作**: ✅ [backend/app/routers/ocr.py:148](backend/app/routers/ocr.py#L148)
- **前端使用**: ✅ [frontend/src/services/api.ts:172](frontend/src/services/api.ts#L172)
- **狀態**: ✅ **正常**
-
---
-
-#### GET `/api/v1/ocr/result/{file_id}`
- **功能**: 取得 OCR 結果
- **路徑參數**:
-  - `file_id`: number - 檔案 ID
- **回應**:
-  ```typescript
-  {
-    file_id: number,
-    filename: string,
-    status: string,
-    markdown_content: string | null,
-    json_data: {
-      total_text_regions: number,
-      average_confidence: number,
-      detected_language: string,
-      layout_data: object | null,
-      images_metadata: array | null
-    } | null,
-    confidence: number | null,
-    processing_time: number | null
-  }
-  ```
- **後端實作**: ✅ [backend/app/routers/ocr.py:182](backend/app/routers/ocr.py#L182)
- **前端使用**: ✅ [frontend/src/services/api.ts:164](frontend/src/services/api.ts#L164)
-  - ⚠️ **注意**: 前端使用 `taskId` 作為參數名稱，實際應該是 `file_id`
- **狀態**: ⚠️ **有問題** - 前端參數名稱誤導
-
---
-
-#### ❌ GET `/api/v1/ocr/status/{task_id}`
- **功能**: 取得任務狀態 (前端期待但不存在)
- **狀態**: ❌ **不存在** - 前端呼叫此端點但後端沒有實作
- **前端使用**: [frontend/src/services/api.ts:156](frontend/src/services/api.ts#L156)
- **問題**: 前端會收到 404 錯誤
-
---
-
-### 4. 匯出 API (Export)
-
-#### POST `/api/v1/export`
- **功能**: 匯出 OCR 結果
- **請求 Body**:
-  ```typescript
-  {
-    batch_id: number,
-    format: "txt" | "json" | "excel" | "markdown" | "pdf" | "zip",
-    rule_id: number | null,
-    css_template: string,           // "default", "academic", "business"
-    include_formats: string[] | null,
-    options: {
-      confidence_threshold: number | null,
-      include_metadata: boolean,
-      filename_pattern: string | null,
-      css_template: string | null
-    } | null
-  }
-  ```
- **回應**: File download (Blob)
- **後端實作**: ✅ [backend/app/routers/export.py:38](backend/app/routers/export.py#L38)
- **前端使用**: ✅ [frontend/src/services/api.ts:182](frontend/src/services/api.ts#L182)
- **狀態**: ✅ **正常**
-
---
-
-#### GET `/api/v1/export/pdf/{file_id}`
- **功能**: 產生單一檔案的 PDF
- **路徑參數**:
-  - `file_id`: number - 檔案 ID
- **查詢參數**:
-  - `css_template`: string - CSS 模板名稱
- **回應**: PDF file (Blob)
- **後端實作**: ✅ [backend/app/routers/export.py:144](backend/app/routers/export.py#L144)
- **前端使用**: ✅ [frontend/src/services/api.ts:192](frontend/src/services/api.ts#L192)
- **狀態**: ✅ **正常**
-
---
-
-#### GET `/api/v1/export/rules`
- **功能**: 取得匯出規則清單
- **回應**:
-  ```typescript
-  [
-    {
-      id: number,
-      user_id: number,
-      rule_name: string,
-      description: string | null,
-      config_json: object,
-      css_template: string | null,
-      created_at: string,
-      updated_at: string
-    }
-  ]
-  ```
- **後端實作**: ✅ [backend/app/routers/export.py:206](backend/app/routers/export.py#L206)
- **前端使用**: ✅ [frontend/src/services/api.ts:204](frontend/src/services/api.ts#L204)
- **狀態**: ✅ **正常**
-
---
-
-#### POST `/api/v1/export/rules`
- **功能**: 建立匯出規則
- **請求 Body**:
-  ```typescript
-  {
-    rule_name: string,
-    description: string | null,
-    config_json: object,
-    css_template: string | null
-  }
-  ```
- **回應**: 同 GET `/api/v1/export/rules` 的單個物件
- **後端實作**: ✅ [backend/app/routers/export.py:220](backend/app/routers/export.py#L220)
- **前端使用**: ✅ [frontend/src/services/api.ts:212](frontend/src/services/api.ts#L212)
- **狀態**: ✅ **正常**
-
---
-
-#### PUT `/api/v1/export/rules/{rule_id}`
- **功能**: 更新匯出規則
- **路徑參數**:
-  - `rule_id`: number - 規則 ID
- **請求 Body**: 同 POST `/api/v1/export/rules` (所有欄位可選)
- **回應**: 同 GET `/api/v1/export/rules` 的單個物件
- **後端實作**: ✅ [backend/app/routers/export.py:254](backend/app/routers/export.py#L254)
- **前端使用**: ✅ [frontend/src/services/api.ts:220](frontend/src/services/api.ts#L220)
- **狀態**: ✅ **正常**
-
---
-
-#### DELETE `/api/v1/export/rules/{rule_id}`
- **功能**: 刪除匯出規則
- **路徑參數**:
-  - `rule_id`: number - 規則 ID
- **回應**:
-  ```typescript
-  {
-    message: "Export rule deleted successfully"
-  }
-  ```
- **後端實作**: ✅ [backend/app/routers/export.py:295](backend/app/routers/export.py#L295)
- **前端使用**: ✅ [frontend/src/services/api.ts:228](frontend/src/services/api.ts#L228)
- **狀態**: ✅ **正常**
-
---
-
-#### GET `/api/v1/export/css-templates`
- **功能**: 取得 CSS 模板清單
- **回應**:
-  ```typescript
-  [
-    {
-      name: string,
-      description: string,
-      filename: string         // ⚠️ Schema 有定義，但實際回傳沒有
-    }
-  ]
-  ```
- **後端實作**: ✅ [backend/app/routers/export.py:326](backend/app/routers/export.py#L326)
-  - 實際回傳: `[{ name, description }]`
-  - Schema 定義: `[{ name, description, filename }]`
- **前端使用**: ✅ [frontend/src/services/api.ts:235](frontend/src/services/api.ts#L235)
- **狀態**: ⚠️ **有問題** - 缺少 `filename` 欄位
-
---
-
-### 5. 翻譯 API (Translation - RESERVED)
-
-#### GET `/api/v1/translate/status`
- **功能**: 取得翻譯功能狀態
- **回應**:
-  ```typescript
-  {
-    status: "RESERVED",
-    message: string,
-    planned_phase: string,
-    features: string[]
-  }
-  ```
- **後端實作**: ✅ [backend/app/routers/translation.py:28](backend/app/routers/translation.py#L28)
- **前端使用**: ❌ 未使用
- **狀態**: ✅ **正常** (預留功能)
-
---
-
-#### GET `/api/v1/translate/languages`
- **功能**: 取得支援的語言清單
- **回應**:
-  ```typescript
-  [
-    {
-      code: string,
-      name: string,
-      native_name: string
-    }
-  ]
-  ```
- **後端實作**: ✅ [backend/app/routers/translation.py:43](backend/app/routers/translation.py#L43)
- **前端使用**: ❌ 未使用
- **狀態**: ✅ **正常** (預留功能)
-
---
-
-#### POST `/api/v1/translate/document`
- **功能**: 翻譯文件 (未實作)
- **請求 Body**:
-  ```typescript
-  {
-    file_id: number,
-    source_lang: string,
-    target_lang: string,
-    engine_type: "argos" | "ernie" | "google" | "deepl",
-    preserve_structure: boolean,
-    engine_config: object | null
-  }
-  ```
- **回應**: HTTP 501 Not Implemented
- **後端實作**: ✅ [backend/app/routers/translation.py:56](backend/app/routers/translation.py#L56) (Stub)
- **前端使用**: ✅ [frontend/src/services/api.ts:247](frontend/src/services/api.ts#L247)
- **狀態**: ⚠️ **預留功能** - 前端會收到 501 錯誤
-
---
-
-#### ❌ GET `/api/v1/translate/configs`
- **功能**: 取得翻譯設定 (前端期待但不存在)
- **狀態**: ❌ **不存在** - 前端呼叫此端點但後端沒有實作
- **前端使用**: [frontend/src/services/api.ts:258](frontend/src/services/api.ts#L258)
- **問題**: 前端會收到 404 錯誤
-
---
-
-#### ❌ POST `/api/v1/translate/configs`
- **功能**: 建立翻譯設定 (前端期待但不存在)
- **狀態**: ❌ **不存在** - 前端呼叫此端點但後端沒有實作
- **前端使用**: [frontend/src/services/api.ts:269](frontend/src/services/api.ts#L269)
- **問題**: 前端會收到 404 錯誤
-
---
-
-### 6. 其他端點
-
-#### GET `/health`
- **功能**: 健康檢查
- **回應**:
-  ```typescript
-  {
-    status: "healthy",
-    service: "Tool_OCR",
-    version: "0.1.0"
-  }
-  ```
- **後端實作**: ✅ [backend/app/main.py:84](backend/app/main.py#L84)
- **前端使用**: ❌ 未使用
- **狀態**: ✅ **正常**
-
---
-
-#### GET `/`
- **功能**: API 資訊
- **回應**:
-  ```typescript
-  {
-    message: "Tool_OCR API",
-    version: "0.1.0",
-    docs_url: "/docs",
-    health_check: "/health"
-  }
-  ```
- **後端實作**: ✅ [backend/app/main.py:95](backend/app/main.py#L95)
- **前端使用**: ❌ 未使用
- **狀態**: ✅ **正常**
-
---
-
-## 前後端不一致問題
-
-### 問題 1: 登入回應結構不一致
-
-**嚴重程度**: 🟡 中等
-
-**問題描述**:
- 後端回傳包含 `expires_in` 欄位 (Token 過期時間)
- 前端 `LoginResponse` 型別定義缺少此欄位
-
-**影響**:
- 前端無法實作 Token 自動續期功能
- 無法提前提醒使用者 Token 即將過期
-
-**位置**:
- 後端: [backend/app/routers/auth.py:66-70](backend/app/routers/auth.py#L66-L70)
- 前端: [frontend/src/types/api.ts:12-15](frontend/src/types/api.ts#L12-L15)
-
---
-
-### 問題 2: OCR 任務狀態 API 不存在
-
-**嚴重程度**: 🔴 高
-
-**問題描述**:
- 前端嘗試呼叫 `/api/v1/ocr/status/{taskId}` 取得任務進度
- 後端僅提供 `/api/v1/batch/{batch_id}/status` 與 `/api/v1/ocr/result/{file_id}`
- 沒有對應的任務狀態追蹤端點
-
-**影響**:
- 前端 `getTaskStatus()` 呼叫會收到 404 錯誤
- 無法實作即時進度輪詢功能
- 使用者無法看到處理進度
-
-**位置**:
- 前端呼叫: [frontend/src/services/api.ts:156-159](frontend/src/services/api.ts#L156-L159)
- 後端路由: 不存在
-
---
-
-### 問題 3: OCR 處理請求/回應模型不符
-
-**嚴重程度**: 🔴 高
-
-**問題描述**:
-1. **請求欄位不匹配**:
-   - 前端傳送 `confidence_threshold` (信心度閾值)
-   - 後端接受 `detect_layout` (版面偵測開關)
-
-2. **回應欄位不匹配**:
-   - 前端期待 `task_id` (用於追蹤任務)
-   - 後端回傳 `message`, `total_files` (但沒有 `task_id`)
-
-**影響**:
- 前端無法正確傳遞參數給後端
- 前端無法取得 `task_id` 進行後續狀態查詢
- 型別檢查會失敗
- 可能導致驗證錯誤
-
-**位置**:
- 前端請求: [frontend/src/types/api.ts:37-41](frontend/src/types/api.ts#L37-L41)
- 前端回應: [frontend/src/types/api.ts:43-47](frontend/src/types/api.ts#L43-L47)
- 後端請求: [backend/app/schemas/ocr.py:120-133](backend/app/schemas/ocr.py#L120-L133)
- 後端回應: [backend/app/schemas/ocr.py:136-151](backend/app/schemas/ocr.py#L136-L151)
-
---
-
-### 問題 4: 上傳檔案欄位命名不一致
-
-**嚴重程度**: 🟡 中等
-
-**問題描述**:
- 後端使用 `file_format` 回傳檔案格式
- 前端型別定義使用 `format`
-
-**影響**:
- 前端無法直接使用後端回傳的 `file_format` 欄位
- 需要額外的欄位映射或轉換
- UI 顯示檔案格式時可能為 undefined
-
-**位置**:
- 前端: [frontend/src/types/api.ts:32](frontend/src/types/api.ts#L32)
- 後端: [backend/app/schemas/ocr.py:19](backend/app/schemas/ocr.py#L19)
-
---
-
-### 問題 5: CSS 模板清單缺少 filename
-
-**嚴重程度**: 🟡 中等
-
-**問題描述**:
- 前端 `CSSTemplate` 型別期待包含 `filename` 欄位
- 後端 Schema `CSSTemplateResponse` 也定義了 `filename`
- 但後端實際回傳只有 `name` 和 `description`
-
-**影響**:
- 前端無法使用 `filename` 作為 `<option>` 的 key/value
- 渲染時 `filename` 為 undefined
- 前端需要額外邏輯處理或使用 `name` 代替
-
-**位置**:
- 前端型別: [frontend/src/types/api.ts:132-136](frontend/src/types/api.ts#L132-L136)
- 後端 Schema: [backend/app/schemas/export.py:91-104](backend/app/schemas/export.py#L91-L104)
- 後端實作: [backend/app/routers/export.py:333-338](backend/app/routers/export.py#L333-L338)
- PDF 服務: [backend/app/services/pdf_generator.py:485-496](backend/app/services/pdf_generator.py#L485-L496)
-
-**根本原因**:
-`PDFGenerator.get_available_templates()` 只回傳 `{name: description}` 的 dict，沒有包含 filename
-
---
-
-### 問題 6: 翻譯設定端點未實作
-
-**嚴重程度**: 🟢 低 (預留功能)
-
-**問題描述**:
- 前端嘗試呼叫 `/api/v1/translate/configs` (GET/POST)
- 後端翻譯路由僅實作 `/status`, `/languages`, `/document`
- 沒有 configs 相關端點
-
-**影響**:
- 前端呼叫會收到 404 錯誤
- 無法管理翻譯設定
- 但因為翻譯功能整體都是 Phase 5 預留功能，影響較小
-
-**位置**:
- 前端 GET: [frontend/src/services/api.ts:258-262](frontend/src/services/api.ts#L258-L262)
- 前端 POST: [frontend/src/services/api.ts:269-275](frontend/src/services/api.ts#L269-L275)
- 後端路由: 不存在
-
---
-
-## 修正建議
-
-### 建議 1: 統一登入回應模型
-
-**優先順序**: P2 (中優先)
-
-**方案 A - 前端新增 expires_in** (推薦):
-```typescript
-// frontend/src/types/api.ts
-export interface LoginResponse {
-  access_token: string
-  token_type: string
-  expires_in: number  // 新增此欄位
-}
-```
-
-**方案 B - 後端移除 expires_in**:
- 如果不需要 Token 過期管理，可移除此欄位
- 不推薦，因為這是常見的 JWT 最佳實踐
-
---
-
-### 建議 2: 統一 OCR 任務追蹤策略
-
-**優先順序**: P1 (高優先)
-
-**方案 A - 統一使用批次狀態** (推薦):
-1. 前端刪除 `getTaskStatus()` 方法
-2. 統一使用 `getBatchStatus()` 輪詢批次狀態
-3. 修改 `ProcessResponse` 移除 `task_id`
-
-**方案 B - 後端新增任務狀態端點**:
-1. 新增 `GET /api/v1/ocr/status/{task_id}` 端點
-2. `ProcessResponse` 真正回傳 `task_id`
-3. 實作任務級別的狀態追蹤
-
-**建議**: 採用方案 A，因為目前架構已經有批次級別的狀態管理
-
---
-
-### 建議 3: 校正 OCR 處理請求/回應
-
-**優先順序**: P1 (高優先)
-
-**方案 A - 前端配合後端** (推薦):
-```typescript
-// frontend/src/types/api.ts
-export interface ProcessRequest {
-  batch_id: number
-  lang?: string
-  detect_layout?: boolean  // 改為 detect_layout
-}
-
-export interface ProcessResponse {
-  message: string          // 新增
-  batch_id: number
-  total_files: number      // 新增
-  status: string
-  // 移除 task_id
-}
-```
-
-**方案 B - 後端配合前端**:
- 支援 `confidence_threshold` 參數
- 回應包含 `task_id`
- 需要較大改動，不推薦
-
---
-
-### 建議 4: 對齊上傳檔案欄位命名
-
-**優先順序**: P2 (中優先)
-
-**方案 A - 前端改用 file_format** (推薦):
-```typescript
-// frontend/src/types/api.ts
-export interface FileInfo {
-  id: number
-  filename: string
-  file_size: number
-  file_format: string  // 改名為 file_format
-  status: 'pending' | 'processing' | 'completed' | 'failed'
-}
-```
-
-**方案 B - 後端使用 Pydantic Alias**:
-```python
-# backend/app/schemas/ocr.py
-file_format: str = Field(..., alias='format')
-```
-
---
-
-### 建議 5: 補充 CSS 模板 filename
-
-**優先順序**: P2 (中優先)
-
-**方案 A - 修改 PDF Generator 回傳結構** (推薦):
-```python
-# backend/app/services/pdf_generator.py
-def get_available_templates(self) -> Dict[str, Dict[str, str]]:
-    """Get list of available CSS templates with filename"""
-    return {
-        "default": {
-            "description": "通用排版模板，適合大多數文檔",
-            "filename": "default.css"
-        },
-        "academic": {
-            "description": "學術論文模板，適合研究報告",
-            "filename": "academic.css"
-        },
-        "business": {
-            "description": "商業報告模板，適合企業文檔",
-            "filename": "business.css"
-        },
-    }
-```
-
-**方案 B - 前端使用 name 作為 filename**:
- 因為實際上模板名稱就是識別碼
- 不需要額外的 filename
-
---
-
-### 建議 6: 處理翻譯設定 Stub
-
-**優先順序**: P3 (低優先)
-
-**方案 A - 前端移除相關呼叫** (推薦):
-1. 移除或註解 `getTranslationConfigs()` 和 `createTranslationConfig()`
-2. UI 顯示「即將推出」訊息
-
-**方案 B - 後端補上 Stub 端點**:
-```python
-# backend/app/routers/translation.py
-@router.get("/configs")
-async def get_translation_configs():
-    raise HTTPException(status_code=501, detail="Feature reserved for Phase 5")
-
-@router.post("/configs")
-async def create_translation_config():
-    raise HTTPException(status_code=501, detail="Feature reserved for Phase 5")
-```
-
---
-
-## 實作優先順序總結
-
-### P1 - 立即修正 (影響核心功能)
-1. ✅ **建議 2**: 統一 OCR 任務追蹤策略
-2. ✅ **建議 3**: 校正 OCR 處理請求/回應模型
-
-### P2 - 近期修正 (影響使用體驗)
-3. ✅ **建議 1**: 統一登入回應模型
-4. ✅ **建議 4**: 對齊上傳檔案欄位命名
-5. ✅ **建議 5**: 補充 CSS 模板 filename
-
-### P3 - 可延後 (預留功能)
-6. ⏸️ **建議 6**: 處理翻譯設定 Stub (Phase 5 再處理)
-
---
-
-## 文件維護
-
-**更新記錄**:
- 2025-01-13: 初始版本，完整盤點所有 API 端點及問題
-
-**維護責任**:
- 每次 API 變更時必須更新此文件
- 新增 API 端點時補充到對應章節
- 修正問題後更新狀態
-
---
-
-## 附錄: 快速檢查清單
-
-### 新增 API 端點時的檢查項目
- [ ] 後端 Schema 定義是否完整?
- [ ] 前端 TypeScript 型別是否匹配?
- [ ] 欄位命名是否一致 (camelCase vs snake_case)?
- [ ] 回應結構是否符合前端期待?
- [ ] 錯誤處理是否完整?
- [ ] API 文件是否更新?
- [ ] 是否有對應的測試?
-
-### API 修改時的檢查項目
- [ ] 前後端是否同步修改?
- [ ] 是否有破壞性變更 (Breaking Change)?
- [ ] 相關文件是否更新?
- [ ] 現有功能是否受影響?
- [ ] 是否需要版本遷移?
--- a/CHART_RECOGNITION.md
+++ b/CHART_RECOGNITION.md
@@ -1,275 +0,0 @@
-# Chart Recognition Feature Status
-
-## 🎉 當前狀態：已啟用！
-
-圖表識別功能已經**啟用**！PaddlePaddle 3.2.1 提供了所需的 `fused_rms_norm_ext` API。
-
-### ✅ 問題已解決
-
- **解決日期**: 2025-11-16
- **PaddlePaddle 版本**: 3.2.1 (從 3.0.0 升級)
- **API 狀態**: `fused_rms_norm_ext` 現已可用 ✅
- **功能狀態**: PP-StructureV3 圖表識別已啟用 ✅
- **代碼更新**: [ocr_service.py:217](backend/app/services/ocr_service.py#L217) - `use_chart_recognition=True`
-
-### 📜 歷史限制 (已解決)
-
- **原始問題**: PaddlePaddle 3.0.0 缺少 `fused_rms_norm_ext` API
- **記錄時間**: 2025年3月 (基於 PaddlePaddle 3.0.0)
- **解決版本**: PaddlePaddle 3.2.0+ (2025年9月發布)
- **驗證版本**: PaddlePaddle 3.2.1 確認支持
-
---
-
-## 🎯 現在可用的完整功能
-
-| 功能類別 | 功能 | 狀態 | 說明 |
-|---------|------|------|------|
-| **基礎OCR** | 文字識別 | ✅ 正常 | OCR 核心功能 |
-| **布局分析** | 圖表檢測 | ✅ 正常 | 識別圖表位置 |
-| **布局分析** | 圖表提取 | ✅ 正常 | 保存為圖像文件 |
-| **表格識別** | 表格識別 | ✅ 正常 | 支持嵌套公式/圖片 |
-| **公式識別** | LaTeX 提取 | ✅ 正常 | 數學公式識別 |
-| **圖表識別** | 圖表類型識別 | ✅ **已啟用** | 柱狀圖、折線圖等類型 |
-| **圖表識別** | 數據提取 | ✅ **已啟用** | 從圖表提取數值數據 |
-| **圖表識別** | 軸/圖例解析 | ✅ **已啟用** | 坐標軸標籤和圖例 |
-| **圖表識別** | 圖表轉結構化 | ✅ **已啟用** | 轉換為 JSON/表格格式 |
-
---
-
-## 🔧 系統配置更新
-
-### 1. CUDA 庫路徑配置
-
-為了支持 GPU 加速，WSL CUDA 庫路徑已添加到系統配置：
-
-```bash
-# ~/.bashrc
-export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH
-```
-
-### 2. PaddlePaddle 版本
-
-```bash
-# 當前版本
-PaddlePaddle 3.2.1
-
-# GPU 支持
-✅ CUDA 12.6
-✅ cuDNN 9.5
-✅ GPU Compute Capability: 8.9
-```
-
-### 3. 服務配置
-
-```python
-# backend/app/services/ocr_service.py:217
-use_chart_recognition=True  # ✅ 已啟用
-```
-
---
-
-## 📊 版本歷史與 API 支持
-
-| 版本 | 發布日期 | `fused_rms_norm_ext` 狀態 | 圖表識別 |
-|------|---------|-------------------------|---------|
-| 3.0.0 | 2025-03-26 | ❌ 不支持 | ❌ 禁用 |
-| 3.1.0 | 2025-06-29 | ❓ 未驗證 | ❓ 未知 |
-| 3.1.1 | 2025-08-20 | ❓ 未驗證 | ❓ 未知 |
-| 3.2.0 | 2025-09-08 | ✅ 可能支持 | ✅ 可啟用 |
-| 3.2.1 | 2025-10-30 | ✅ **確認支持** | ✅ **已啟用** |
-| 3.2.2 | 2025-11-14 | ✅ 應該支持 | ✅ 應該可用 |
-
-**驗證日期**: 2025-11-16
-**驗證版本**: PaddlePaddle 3.2.1
-**驗證腳本**: `backend/verify_chart_recognition.py`
-
---
-
-## ⚠️ 性能考量
-
-啟用圖表識別後的影響：
-
-### 處理時間
- **簡單圖表**: 每個圖表增加 2-3 秒
- **複雜圖表**: 每個圖表增加 5-10 秒
- **多圖表頁面**: 處理時間相應增加
-
-### 記憶體使用
- **GPU 記憶體**: 增加約 500MB-1GB
- **系統記憶體**: 增加約 200-500MB
-
-### 準確率
- **簡單圖表** (柱狀圖、折線圖): >85%
- **複雜圖表** (多軸、組合圖): >70%
- **特殊圖表** (雷達圖、散點圖): >60%
-
-**建議**: 對於包含大量圖表的文檔，建議使用 GPU 加速以獲得最佳性能。
-
---
-
-## 🧪 測試圖表識別
-
-### 快速測試
-
-使用驗證腳本確認功能可用：
-
-```bash
-cd /home/egg/project/Tool_OCR
-source venv/bin/activate
-python backend/verify_chart_recognition.py
-```
-
-預期輸出：
-```
-✅ PaddlePaddle version: 3.2.1
-📊 API Availability:
-   - fused_rms_norm:     ✅ Available
-   - fused_rms_norm_ext: ✅ Available
-🎉 Chart recognition CAN be enabled!
-```
-
-### 實際測試
-
-1. **啟動後端服務**:
-   ```bash
-   cd backend
-   source venv/bin/activate
-   python -m app.main
-   ```
-
-2. **上傳包含圖表的文檔**:
-   - PDF、Word、PowerPoint 等
-   - 確保文檔中包含圖表（柱狀圖、折線圖等）
-
-3. **檢查輸出結果**:
-   - 查看解析結果中是否包含圖表數據
-   - 驗證圖表類型識別是否正確
-   - 檢查數據提取是否準確
-
---
-
-## 🔍 技術細節
-
-### fused_rms_norm_ext API
-
-**RMSNorm (Root Mean Square Layer Normalization)**:
- 深度學習中的層歸一化技術
- 相比 LayerNorm 計算效率更高
- PaddleOCR-VL 圖表識別模型的核心組件
-
-**API 簽名**:
-```python
-paddle.incubate.nn.functional.fused_rms_norm_ext(
-    x,
-    norm_weight,
-    norm_bias=None,
-    epsilon=1e-5,
-    begin_norm_axis=1,
-    bias=None,
-    residual=None,
-    quant_scale=-1,
-    quant_round_type=0,
-    quant_max_bound=0,
-    quant_min_bound=0
-)
-```
-
-**與基礎版本的差異**:
- `fused_rms_norm`: 基礎實現
- `fused_rms_norm_ext`: 擴展版本，提供額外的優化和參數
-
-### 代碼位置
-
- **主要啟用**: [backend/app/services/ocr_service.py:217](backend/app/services/ocr_service.py#L217)
- **CPU Fallback**: [backend/app/services/ocr_service.py:235](backend/app/services/ocr_service.py#L235)
- **PP-StructureV3 初始化**: [backend/app/services/ocr_service.py:211-219](backend/app/services/ocr_service.py#L211-L219)
-
---
-
-## 📚 相關文檔更新
-
-以下文檔需要更新以反映圖表識別已啟用：
-
-### 已更新
- ✅ `CHART_RECOGNITION.md` - 本文檔
- ✅ `backend/app/services/ocr_service.py` - 代碼實現
-
-### 待更新
- [ ] `README.md` - 移除 "Known Limitations" 中的圖表識別限制
- [ ] `openspec/changes/add-gpu-acceleration-support/tasks.md` - 標記任務 5.4 為完成
- [ ] `openspec/changes/add-gpu-acceleration-support/proposal.md` - 更新 "Known Issues" 部分
- [ ] `openspec/project.md` - 添加圖表識別功能說明
-
---
-
-## 🆘 故障排除
-
-### 問題: 升級後仍顯示不可用
-
-**診斷**:
-```bash
-python -c "import paddle; print(paddle.__version__)"
-python -c "import paddle.incubate.nn.functional as F; print(hasattr(F, 'fused_rms_norm_ext'))"
-```
-
-**解決方案**:
-1. 確保虛擬環境已激活
-2. 完全重新安裝 PaddlePaddle:
-   ```bash
-   pip uninstall paddlepaddle -y
-   pip install 'paddlepaddle>=3.2.0'
-   ```
-
-### 問題: GPU 初始化失敗
-
-**錯誤信息**: `libcuda.so.1: cannot open shared object file`
-
-**解決方案**:
-```bash
-# 確認 LD_LIBRARY_PATH 包含 WSL CUDA 路徑
-echo $LD_LIBRARY_PATH | grep wsl
-
-# 如果沒有，添加到 ~/.bashrc:
-echo 'export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH' >> ~/.bashrc
-source ~/.bashrc
-```
-
-### 問題: 圖表識別結果不準確
-
-**可能原因**:
- 圖表圖像質量低
- 圖表類型特殊或複雜
- 文字遮擋或重疊
-
-**改進建議**:
- 提高輸入文檔的分辨率
- 使用清晰的圖表樣式
- 必要時進行人工校對
-
---
-
-## 🎉 總結
-
-**圖表識別功能現已完全可用！**
-
-| 項目 | 狀態 |
-|------|------|
-| API 可用性 | ✅ `fused_rms_norm_ext` 已在 PaddlePaddle 3.2.1 中提供 |
-| 功能狀態 | ✅ 圖表識別已啟用 |
-| GPU 支持 | ✅ CUDA 12.6 + cuDNN 9.5 正常運行 |
-| 測試驗證 | ✅ 驗證腳本確認功能可用 |
-| 文檔更新 | ✅ 本文檔已更新 |
-
-**下一步**:
-1. 測試實際文檔處理
-2. 驗證圖表識別準確率
-3. 更新相關 README 和 OpenSpec 文檔
-4. 考慮性能優化和調整
-
---
-
-**最後更新**: 2025-11-16
-**更新者**: Development Team
-**PaddlePaddle 版本**: 3.2.1
-**功能狀態**: ✅ 圖表識別已啟用
--- a/FRONTEND_API.md
+++ b/FRONTEND_API.md
@@ -1,893 +0,0 @@
-# Tool_OCR Frontend API Documentation
-
-> **Version**: 0.1.0
-> **Last Updated**: 2025-01-13
-> **Purpose**: Complete documentation of frontend architecture, component structure, API integration, and dependencies
-
---
-
-## Table of Contents
-
-1. [Project Overview](#project-overview)
-2. [Technology Stack](#technology-stack)
-3. [Component Architecture](#component-architecture)
-4. [Page → API Dependency Matrix](#page--api-dependency-matrix)
-5. [Component Tree Structure](#component-tree-structure)
-6. [State Management Strategy](#state-management-strategy)
-7. [Route Configuration](#route-configuration)
-8. [API Integration Patterns](#api-integration-patterns)
-9. [UI/UX Design System](#uiux-design-system)
-10. [Error Handling Patterns](#error-handling-patterns)
-11. [Deployment Configuration](#deployment-configuration)
-
---
-
-## Project Overview
-
-Tool_OCR 前端是一個基於 React 18 + Vite 的現代化 OCR 文件處理系統,提供企業級的使用者介面和體驗。
-
-### Key Features
-
- **批次檔案上傳**: 支援拖放上傳,多檔案批次處理
- **即時進度追蹤**: 使用輪詢機制顯示 OCR 處理進度
- **結果預覽**: Markdown 和 JSON 雙格式預覽
- **靈活匯出**: 支援 TXT、JSON、Excel、Markdown、PDF、ZIP 多種格式
- **規則管理**: 可自訂匯出規則和 CSS 模板
- **響應式設計**: 適配桌面和平板裝置
-
---
-
-## Technology Stack
-
-### Core Dependencies
-
-```json
-{
-  "@tanstack/react-query": "^5.90.7",    // Server state management
-  "react": "^19.2.0",                     // UI framework
-  "react-dom": "^19.2.0",
-  "react-router-dom": "^7.9.5",           // Routing
-  "vite": "^7.2.2",                       // Build tool
-  "typescript": "~5.9.3"                  // Type safety
-}
-```
-
-### UI & Styling
-
-```json
-{
-  "tailwindcss": "^4.1.17",               // CSS framework
-  "class-variance-authority": "^0.7.0",   // Component variants
-  "clsx": "^2.1.1",                       // Class name utility
-  "tailwind-merge": "^3.4.0",             // Tailwind class merge
-  "lucide-react": "^0.553.0"              // Icon library
-}
-```
-
-### State & Data
-
-```json
-{
-  "zustand": "^5.0.8",                    // Client state
-  "axios": "^1.13.2",                     // HTTP client
-  "react-dropzone": "^14.3.8",            // File upload
-  "react-markdown": "^9.0.1"              // Markdown rendering
-}
-```
-
-### Internationalization
-
-```json
-{
-  "i18next": "^25.6.2",
-  "react-i18next": "^16.3.0"
-}
-```
-
---
-
-## Component Architecture
-
-### Atomic Design Structure
-
-```
-frontend/src/
-├── components/
-│   ├── ui/                 # Atomic components (shadcn/ui)
-│   │   ├── button.tsx
-│   │   ├── card.tsx
-│   │   ├── input.tsx
-│   │   ├── label.tsx
-│   │   ├── select.tsx
-│   │   ├── badge.tsx
-│   │   ├── progress.tsx
-│   │   ├── alert.tsx
-│   │   ├── dialog.tsx
-│   │   ├── tabs.tsx
-│   │   ├── table.tsx
-│   │   └── toast.tsx
-│   ├── FileUpload.tsx      # Drag-and-drop upload component
-│   ├── ResultsTable.tsx    # OCR results display table
-│   ├── MarkdownPreview.tsx # Markdown content renderer
-│   └── Layout.tsx          # Main app layout with sidebar
-├── pages/
-│   ├── LoginPage.tsx       # Authentication
-│   ├── UploadPage.tsx      # File upload and selection
-│   ├── ProcessingPage.tsx  # OCR processing status
-│   ├── ResultsPage.tsx     # Results viewing and preview
-│   ├── ExportPage.tsx      # Export configuration and download
-│   └── SettingsPage.tsx    # User settings and rules management
-├── store/
-│   ├── authStore.ts        # Authentication state (Zustand)
-│   └── uploadStore.ts      # Upload batch state (Zustand)
-├── services/
-│   └── api.ts              # API client (Axios)
-├── types/
-│   └── api.ts              # TypeScript type definitions
-├── lib/
-│   └── utils.ts            # Utility functions
-├── i18n/
-│   └── index.ts            # i18n configuration
-└── styles/
-    └── index.css           # Global styles and CSS variables
-```
-
---
-
-## Page → API Dependency Matrix
-
-| Page/Component | API Endpoints Used | HTTP Method | Purpose | Polling |
-|----------------|-------------------|-------------|---------|---------|
-| **LoginPage** | `/api/v1/auth/login` | POST | User authentication | No |
-| **UploadPage** | `/api/v1/upload` | POST | Upload files for OCR | No |
-| **ProcessingPage** | `/api/v1/ocr/process` | POST | Start OCR processing | No |
-| | `/api/v1/batch/{batch_id}/status` | GET | Poll batch status | Yes (2s) |
-| **ResultsPage** | `/api/v1/batch/{batch_id}/status` | GET | Load completed files | No |
-| | `/api/v1/ocr/result/{file_id}` | GET | Get OCR result details | No |
-| | `/api/v1/export/pdf/{file_id}` | GET | Download PDF export | No |
-| **ExportPage** | `/api/v1/export` | POST | Export batch results | No |
-| | `/api/v1/export/rules` | GET | List export rules | No |
-| | `/api/v1/export/rules` | POST | Create new rule | No |
-| | `/api/v1/export/rules/{rule_id}` | PUT | Update existing rule | No |
-| | `/api/v1/export/rules/{rule_id}` | DELETE | Delete rule | No |
-| | `/api/v1/export/css-templates` | GET | List CSS templates | No |
-| **SettingsPage** | `/api/v1/export/rules` | GET | Manage export rules | No |
-
---
-
-## Component Tree Structure
-
-```
-App
-├── Router (React Router)
-│   ├── PublicRoute
-│   │   └── LoginPage
-│   │       ├── Form (username + password)
-│   │       ├── Button (submit)
-│   │       └── Alert (error display)
-│   └── ProtectedRoute (requires authentication)
-│       └── Layout
-│           ├── Sidebar
-│           │   ├── Logo
-│           │   ├── Navigation Links
-│           │   │   ├── UploadPage link
-│           │   │   ├── ProcessingPage link
-│           │   │   ├── ResultsPage link
-│           │   │   ├── ExportPage link
-│           │   │   └── SettingsPage link
-│           │   └── User Section + Logout
-│           ├── TopBar
-│           │   ├── SearchInput
-│           │   └── NotificationBell
-│           └── MainContent (Outlet)
-│               ├── UploadPage
-│               │   ├── FileUpload (react-dropzone)
-│               │   ├── FileList (selected files)
-│               │   └── UploadButton
-│               ├── ProcessingPage
-│               │   ├── ProgressBar
-│               │   ├── StatsCards (completed/processing/failed)
-│               │   ├── FileStatusList
-│               │   └── ActionButtons
-│               ├── ResultsPage
-│               │   ├── FileList (left sidebar)
-│               │   │   ├── SearchInput
-│               │   │   └── FileItems
-│               │   └── PreviewPanel (right)
-│               │       ├── StatsCards
-│               │       ├── Tabs (Markdown/JSON)
-│               │       ├── MarkdownPreview
-│               │       └── JSONViewer
-│               ├── ExportPage
-│               │   ├── FormatSelector
-│               │   ├── RuleSelector
-│               │   ├── CSSTemplateSelector
-│               │   ├── OptionsForm
-│               │   └── ExportButton
-│               └── SettingsPage
-│                   ├── UserInfo
-│                   ├── ExportRulesManager
-│                   │   ├── RuleList
-│                   │   ├── CreateRuleDialog
-│                   │   ├── EditRuleDialog
-│                   │   └── DeleteConfirmDialog
-│                   └── SystemSettings
-```
-
---
-
-## State Management Strategy
-
-### Client State (Zustand)
-
-**authStore.ts** - Authentication State
-```typescript
-interface AuthState {
-  user: User | null
-  isAuthenticated: boolean
-  setUser: (user: User | null) => void
-  logout: () => void
-}
-```
-
-**uploadStore.ts** - Upload Batch State
-```typescript
-interface UploadState {
-  batchId: number | null
-  files: FileInfo[]
-  uploadProgress: number
-  setBatchId: (id: number) => void
-  setFiles: (files: FileInfo[]) => void
-  setUploadProgress: (progress: number) => void
-  reset: () => void
-}
-```
-
-### Server State (React Query)
-
- **Caching**: Automatic caching with stale-while-revalidate strategy
- **Polling**: Automatic refetch for batch status every 2 seconds during processing
- **Error Handling**: Built-in error retry and error state management
- **Optimistic Updates**: For export rules CRUD operations
-
-### Query Keys
-
-```typescript
-// Batch status polling
-['batchStatus', batchId]
-
-// OCR result for specific file
-['ocrResult', fileId]
-
-// Export rules list
-['exportRules']
-
-// CSS templates list
-['cssTemplates']
-```
-
---
-
-## Route Configuration
-
-| Route | Component | Access Level | Description | Protected |
-|-------|-----------|--------------|-------------|-----------|
-| `/login` | LoginPage | Public | User authentication | No |
-| `/` | Layout (redirect to /upload) | Private | Main layout wrapper | Yes |
-| `/upload` | UploadPage | Private | File upload interface | Yes |
-| `/processing` | ProcessingPage | Private | OCR processing status | Yes |
-| `/results` | ResultsPage | Private | View OCR results | Yes |
-| `/export` | ExportPage | Private | Export configuration | Yes |
-| `/settings` | SettingsPage | Private | User settings | Yes |
-
-### Protected Route Implementation
-
-```typescript
-function ProtectedRoute({ children }: { children: React.ReactNode }) {
-  const isAuthenticated = useAuthStore((state) => state.isAuthenticated)
-
-  if (!isAuthenticated) {
-    return <Navigate to="/login" replace />
-  }
-
-  return <>{children}</>
-}
-```
-
---
-
-## API Integration Patterns
-
-### API Client Configuration
-
-**Base URL**: `http://localhost:12010/api/v1`
-
-**Request Interceptor**: Adds JWT token to Authorization header
-
-```typescript
-this.client.interceptors.request.use((config) => {
-  if (this.token) {
-    config.headers.Authorization = `Bearer ${this.token}`
-  }
-  return config
-})
-```
-
-**Response Interceptor**: Handles 401 errors and redirects to login
-
-```typescript
-this.client.interceptors.response.use(
-  (response) => response,
-  (error: AxiosError<ApiError>) => {
-    if (error.response?.status === 401) {
-      this.clearToken()
-      window.location.href = '/login'
-    }
-    return Promise.reject(error)
-  }
-)
-```
-
-### Authentication Flow
-
-```typescript
-// 1. Login
-const response = await apiClient.login({ username, password })
-// Response: { access_token, token_type, expires_in }
-
-// 2. Store token
-localStorage.setItem('auth_token', response.access_token)
-
-// 3. Set user in store
-setUser({ id: 1, username })
-
-// 4. Navigate to /upload
-navigate('/upload')
-```
-
-### File Upload Flow
-
-```typescript
-// 1. Prepare FormData
-const formData = new FormData()
-files.forEach((file) => formData.append('files', file))
-
-// 2. Upload files
-const response = await apiClient.uploadFiles(files)
-// Response: { batch_id, files: FileInfo[] }
-
-// 3. Store batch info
-setBatchId(response.batch_id)
-setFiles(response.files)
-
-// 4. Navigate to /processing
-navigate('/processing')
-```
-
-### OCR Processing Flow
-
-```typescript
-// 1. Start OCR processing
-await apiClient.processOCR({ batch_id, lang: 'ch', detect_layout: true })
-// Response: { message, batch_id, total_files, status }
-
-// 2. Poll batch status every 2 seconds
-const { data: batchStatus } = useQuery({
-  queryKey: ['batchStatus', batchId],
-  queryFn: () => apiClient.getBatchStatus(batchId),
-  refetchInterval: (query) => {
-    const status = query.state.data?.batch.status
-    if (status === 'completed' || status === 'failed') return false
-    return 2000 // Poll every 2 seconds
-  },
-})
-
-// 3. Auto-redirect when completed
-useEffect(() => {
-  if (batchStatus?.batch.status === 'completed') {
-    navigate('/results')
-  }
-}, [batchStatus?.batch.status])
-```
-
-### Results Viewing Flow
-
-```typescript
-// 1. Load batch status
-const { data: batchStatus } = useQuery({
-  queryKey: ['batchStatus', batchId],
-  queryFn: () => apiClient.getBatchStatus(batchId),
-})
-
-// 2. Select a file
-setSelectedFileId(fileId)
-
-// 3. Load OCR result for selected file
-const { data: ocrResult } = useQuery({
-  queryKey: ['ocrResult', selectedFileId],
-  queryFn: () => apiClient.getOCRResult(selectedFileId),
-  enabled: !!selectedFileId,
-})
-
-// 4. Display in Markdown or JSON format
-<Tabs>
-  <TabsContent value="markdown">
-    <ReactMarkdown>{ocrResult.markdown_content}</ReactMarkdown>
-  </TabsContent>
-  <TabsContent value="json">
-    <pre>{JSON.stringify(ocrResult.json_data, null, 2)}</pre>
-  </TabsContent>
-</Tabs>
-```
-
-### Export Flow
-
-```typescript
-// 1. Select export format and options
-const exportData = {
-  batch_id: batchId,
-  format: 'pdf',
-  rule_id: selectedRuleId,
-  css_template: 'academic',
-  options: { include_metadata: true }
-}
-
-// 2. Request export
-const blob = await apiClient.exportResults(exportData)
-
-// 3. Trigger download
-downloadBlob(blob, `ocr-results-${batchId}.pdf`)
-```
-
---
-
-## UI/UX Design System
-
-### Color Palette (CSS Variables)
-
-```css
-/* Primary - Professional Blue */
--primary: 217 91% 60%;           /* #3b82f6 */
--primary-foreground: 0 0% 100%;
-
-/* Secondary - Gray-Blue */
--secondary: 220 15% 95%;
--secondary-foreground: 220 15% 25%;
-
-/* Accent - Vibrant Teal */
--accent: 173 80% 50%;
--accent-foreground: 0 0% 100%;
-
-/* Success */
--success: 142 72% 45%;           /* #16a34a */
--success-foreground: 0 0% 100%;
-
-/* Destructive */
--destructive: 0 85% 60%;         /* #ef4444 */
--destructive-foreground: 0 0% 100%;
-
-/* Warning */
--warning: 38 92% 50%;
--warning-foreground: 0 0% 100%;
-
-/* Background */
--background: 220 15% 97%;        /* #fafafa */
--card: 0 0% 100%;                /* #ffffff */
--sidebar: 220 25% 12%;           /* Dark blue-gray */
-
-/* Borders */
--border: 220 13% 88%;
--radius: 0.5rem;
-```
-
-### Typography
-
- **Font Family**: System font stack (native)
- **Page Title**: 1.875rem (30px), font-weight: 700
- **Section Title**: 1.125rem (18px), font-weight: 600
- **Body Text**: 0.875rem (14px), font-weight: 400
- **Small Text**: 0.75rem (12px)
-
-### Spacing Scale
-
-```css
--spacing-xs: 0.25rem;   /* 4px */
--spacing-sm: 0.5rem;    /* 8px */
--spacing-md: 1rem;      /* 16px */
--spacing-lg: 1.5rem;    /* 24px */
--spacing-xl: 2rem;      /* 32px */
-```
-
-### Component Variants
-
-**Button Variants**:
- `default`: Primary blue background
- `outline`: Border only
- `secondary`: Muted background
- `destructive`: Red for delete actions
- `ghost`: No background, hover effect
-
-**Alert Variants**:
- `default`: Neutral gray
- `info`: Blue
- `success`: Green
- `warning`: Yellow
- `destructive`: Red
-
-**Badge Variants**:
- `default`: Gray
- `success`: Green
- `warning`: Yellow
- `destructive`: Red
- `secondary`: Muted
-
-### Responsive Breakpoints
-
-```typescript
-// Tailwind breakpoints
-sm: '640px',    // Mobile landscape
-md: '768px',    // Tablet
-lg: '1024px',   // Desktop (primary support)
-xl: '1280px',   // Large desktop
-2xl: '1536px'   // Extra large
-```
-
-**Primary Support**: Desktop (>= 1024px)
-**Secondary Support**: Tablet (768px - 1023px)
-**Optional**: Mobile (< 768px)
-
---
-
-## Error Handling Patterns
-
-### Global Error Boundary
-
-```typescript
-class ErrorBoundary extends Component<Props, State> {
-  static getDerivedStateFromError(error: Error): State {
-    return { hasError: true, error }
-  }
-
-  componentDidCatch(error: Error, errorInfo: ErrorInfo) {
-    console.error('Uncaught error:', error, errorInfo)
-  }
-
-  render() {
-    if (this.state.hasError) {
-      return <ErrorFallbackUI error={this.state.error} />
-    }
-    return this.props.children
-  }
-}
-```
-
-### API Error Handling
-
-```typescript
-try {
-  await apiClient.uploadFiles(files)
-} catch (err: any) {
-  const errorDetail = err.response?.data?.detail
-
-  toast({
-    title: t('upload.uploadError'),
-    description: Array.isArray(errorDetail)
-      ? errorDetail.map(e => e.msg || e.message).join(', ')
-      : errorDetail || t('errors.networkError'),
-    variant: 'destructive',
-  })
-}
-```
-
-### Form Validation
-
-```typescript
-// Client-side validation
-if (selectedFiles.length === 0) {
-  toast({
-    title: t('errors.validationError'),
-    description: '請選擇至少一個檔案',
-    variant: 'destructive',
-  })
-  return
-}
-
-// Backend validation errors
-if (err.response?.status === 422) {
-  const errors = err.response.data.detail
-  // Display validation errors to user
-}
-```
-
-### Loading States
-
-```typescript
-// Query loading state
-const { data, isLoading, error } = useQuery({
-  queryKey: ['batchStatus', batchId],
-  queryFn: () => apiClient.getBatchStatus(batchId),
-})
-
-if (isLoading) return <LoadingSpinner />
-if (error) return <ErrorAlert error={error} />
-if (!data) return <EmptyState />
-
-// Mutation loading state
-const mutation = useMutation({
-  mutationFn: apiClient.uploadFiles,
-  onSuccess: () => { /* success */ },
-  onError: () => { /* error */ },
-})
-
-<Button disabled={mutation.isPending}>
-  {mutation.isPending ? <Loader2 className="animate-spin" /> : '上傳'}
-</Button>
-```
-
---
-
-## Deployment Configuration
-
-### Environment Variables
-
-```bash
-# .env.production
-VITE_API_BASE_URL=http://localhost:12010
-VITE_APP_NAME=Tool_OCR
-VITE_APP_VERSION=0.1.0
-```
-
-### Build Configuration
-
-**vite.config.ts**:
-```typescript
-export default defineConfig({
-  plugins: [react()],
-  server: {
-    port: 12011,
-    proxy: {
-      '/api': {
-        target: 'http://localhost:12010',
-        changeOrigin: true,
-      },
-    },
-  },
-  build: {
-    outDir: 'dist',
-    sourcemap: false,
-    rollupOptions: {
-      output: {
-        manualChunks: {
-          vendor: ['react', 'react-dom', 'react-router-dom'],
-          ui: ['@tanstack/react-query', 'zustand', 'lucide-react'],
-        },
-      },
-    },
-  },
-})
-```
-
-### Build Commands
-
-```bash
-# Development
-npm run dev
-
-# Production build
-npm run build
-
-# Preview production build
-npm run preview
-```
-
-### Nginx Configuration
-
-```nginx
-server {
-    listen 80;
-    server_name tool-ocr.example.com;
-    root /path/to/Tool_OCR/frontend/dist;
-
-    # Frontend static files
-    location / {
-        try_files $uri $uri/ /index.html;
-    }
-
-    # API reverse proxy
-    location /api {
-        proxy_pass http://127.0.0.1:12010;
-        proxy_set_header Host $host;
-        proxy_set_header X-Real-IP $remote_addr;
-        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
-    }
-
-    # Static assets caching
-    location ~* \.(js|css|png|jpg|jpeg|gif|ico|svg|woff|woff2)$ {
-        expires 1y;
-        add_header Cache-Control "public, immutable";
-    }
-}
-```
-
---
-
-## Performance Optimization
-
-### Code Splitting
-
- **Vendor Bundle**: React, React Router, React Query (separate chunk)
- **UI Bundle**: Zustand, Lucide React, UI components
- **Route-based Splitting**: Lazy load pages with `React.lazy()`
-
-### Caching Strategy
-
- **React Query Cache**: 5 minutes stale time for most queries
- **Polling Interval**: 2 seconds during OCR processing
- **Infinite Cache**: Export rules (rarely change)
-
-### Asset Optimization
-
- **Images**: Convert to WebP format, use appropriate sizes
- **Fonts**: System font stack (no custom fonts)
- **Icons**: Lucide React (tree-shakeable)
-
---
-
-## Testing Strategy
-
-### Component Testing (Planned)
-
-```typescript
-// Example: UploadPage.test.tsx
-import { render, screen, fireEvent } from '@testing-library/react'
-import { UploadPage } from '@/pages/UploadPage'
-
-describe('UploadPage', () => {
-  it('should display file upload area', () => {
-    render(<UploadPage />)
-    expect(screen.getByText(/拖放檔案/i)).toBeInTheDocument()
-  })
-
-  it('should allow file selection', async () => {
-    render(<UploadPage />)
-    const file = new File(['content'], 'test.pdf', { type: 'application/pdf' })
-    // Test file upload
-  })
-})
-```
-
-### API Integration Testing
-
- **Mock API Responses**: Use MSW (Mock Service Worker)
- **Error Scenarios**: Test 401, 404, 500 responses
- **Loading States**: Test skeleton/spinner display
-
---
-
-## Accessibility Standards
-
-### WCAG 2.1 AA Compliance
-
- **Keyboard Navigation**: All interactive elements accessible via keyboard
- **Focus Indicators**: Visible focus states on all inputs and buttons
- **ARIA Labels**: Proper labels for screen readers
- **Color Contrast**: Minimum 4.5:1 ratio for text
- **Alt Text**: All images have descriptive alt attributes
-
-### Semantic HTML
-
-```typescript
-// Use semantic elements
-<nav>      // Navigation
-<main>     // Main content
-<aside>    // Sidebar
-<article>  // Independent content
-<section>  // Grouped content
-```
-
---
-
-## Browser Compatibility
-
-### Minimum Supported Versions
-
- **Chrome**: 90+
- **Firefox**: 88+
- **Edge**: 90+
- **Safari**: 14+
-
-### Polyfills Required
-
- None (modern build target: ES2020)
-
---
-
-## Development Workflow
-
-### Local Development
-
-```bash
-# 1. Install dependencies
-npm install
-
-# 2. Start dev server
-npm run dev
-# Frontend: http://localhost:12011
-# API Proxy: http://localhost:12011/api -> http://localhost:12010/api
-
-# 3. Build for production
-npm run build
-
-# 4. Preview production build
-npm run preview
-```
-
-### Code Style
-
- **Formatter**: Prettier (automatic on save)
- **Linter**: ESLint
- **Type Checking**: TypeScript strict mode
-
---
-
-## Known Issues & Limitations
-
-### Current Limitations
-
-1. **No Real-time WebSocket**: Uses HTTP polling for progress updates
-2. **No Offline Support**: Requires active internet connection
-3. **No Mobile Optimization**: Primarily designed for desktop/tablet
-4. **Translation Feature Stub**: Planned for Phase 5
-5. **File Size Limit**: Frontend validates 50MB per file, backend may differ
-
-### Future Improvements
-
- [ ] Implement WebSocket for real-time updates
- [ ] Add dark mode toggle
- [ ] Mobile responsive design
- [ ] Implement translation feature
- [ ] Add E2E tests with Playwright
- [ ] PWA support for offline capability
-
---
-
-## Maintenance & Updates
-
-### Update Checklist
-
-When updating API contracts:
-1. Update TypeScript types in `@/types/api.ts`
-2. Update API client methods in `@/services/api.ts`
-3. Update this documentation (FRONTEND_API.md)
-4. Update corresponding page components
-5. Test integration thoroughly
-
-### Dependency Updates
-
-```bash
-# Check for updates
-npm outdated
-
-# Update dependencies
-npm update
-
-# Update to latest (breaking changes possible)
-npm install <package>@latest
-```
-
---
-
-## Contact & Support
-
-**Frontend Developer**: Claude Code
-**Documentation Version**: 0.1.0
-**Last Updated**: 2025-01-13
-
-For API questions, refer to:
- `API_REFERENCE.md` - Complete API documentation
- `backend_api.md` - Backend implementation details
- FastAPI Swagger UI: `http://localhost:12010/docs`
-
---
-
-**End of Documentation**
--- a/TESTING.md
+++ b/TESTING.md
@@ -1,258 +0,0 @@
-# Tool_OCR Testing Guide
-
-## 測試架構
-
-本專案包含完整的測試套件，包括單元測試和集成測試。
-
---
-
-## 後端測試
-
-### 安裝測試依賴
-
-```bash
-cd backend
-pip install pytest pytest-cov httpx
-```
-
-### 運行所有測試
-
-```bash
-# 運行所有測試
-pytest
-
-# 運行並顯示詳細輸出
-pytest -v
-
-# 運行並生成覆蓋率報告
-pytest --cov=app --cov-report=html
-```
-
-### 運行特定測試
-
-```bash
-# 僅運行單元測試
-pytest tests/test_auth.py
-pytest tests/test_tasks.py
-pytest tests/test_admin.py
-
-# 僅運行集成測試
-pytest tests/test_integration.py
-
-# 運行特定測試類
-pytest tests/test_tasks.py::TestTasks
-
-# 運行特定測試方法
-pytest tests/test_tasks.py::TestTasks::test_create_task
-```
-
-### 測試覆蓋
-
-**單元測試** (`tests/test_*.py`):
- `test_auth.py` - 認證端點測試
-  - 登入成功/失敗
-  - Token 驗證
-  - 登出功能
- `test_tasks.py` - 任務管理測試
-  - 任務 CRUD 操作
-  - 用戶隔離驗證
-  - 統計數據
- `test_admin.py` - 管理員功能測試
-  - 系統統計
-  - 用戶列表
-  - 審計日誌
-
-**集成測試** (`tests/test_integration.py`):
- 完整認證和任務流程
- 管理員工作流程
- 任務生命週期
-
---
-
-## 測試資料庫
-
-測試使用 SQLite 記憶體資料庫，每次測試後自動清理：
- 不影響開發或生產資料庫
- 快速執行
- 完全隔離
-
---
-
-## Fixtures (測試夾具)
-
-在 `conftest.py` 中定義：
-
- `db` - 測試資料庫 session
- `client` - FastAPI 測試客戶端
- `test_user` - 一般測試用戶
- `admin_user` - 管理員測試用戶
- `auth_token` - 測試用戶的認證 token
- `admin_token` - 管理員的認證 token
- `test_task` - 測試任務
-
---
-
-## 測試範例
-
-### 編寫新的單元測試
-
-```python
-# tests/test_my_feature.py
-
-import pytest
-
-
-class TestMyFeature:
-    """Test my new feature"""
-
-    def test_feature_works(self, client, auth_token):
-        """Test that feature works correctly"""
-        response = client.get(
-            '/api/v2/my-endpoint',
-            headers={'Authorization': f'Bearer {auth_token}'}
-        )
-
-        assert response.status_code == 200
-        data = response.json()
-        assert 'expected_field' in data
-```
-
-### 編寫新的集成測試
-
-```python
-# tests/test_integration.py
-
-class TestIntegration:
-
-    def test_complete_workflow(self, client, db):
-        """Test complete user workflow"""
-        # Step 1: Login
-        # Step 2: Perform actions
-        # Step 3: Verify results
-        pass
-```
-
---
-
-## CI/CD 整合
-
-### GitHub Actions 範例
-
-```yaml
-name: Tests
-
-on: [push, pull_request]
-
-jobs:
-  test:
-    runs-on: ubuntu-latest
-
-    steps:
-    - uses: actions/checkout@v2
-
-    - name: Set up Python
-      uses: actions/setup-python@v2
-      with:
-        python-version: 3.11
-
-    - name: Install dependencies
-      run: |
-        cd backend
-        pip install -r requirements.txt
-        pip install pytest pytest-cov
-
-    - name: Run tests
-      run: |
-        cd backend
-        pytest --cov=app --cov-report=xml
-
-    - name: Upload coverage
-      uses: codecov/codecov-action@v2
-```
-
---
-
-## 前端測試 (未來計劃)
-
-### 建議測試框架
- **單元測試**: Vitest
- **元件測試**: React Testing Library
- **E2E 測試**: Playwright
-
-### 範例配置
-
-```bash
-# 安裝測試依賴
-npm install --save-dev vitest @testing-library/react @testing-library/jest-dom
-
-# 運行測試
-npm test
-
-# 運行 E2E 測試
-npm run test:e2e
-```
-
---
-
-## 測試最佳實踐
-
-### 1. 測試命名規範
- 使用描述性名稱: `test_user_can_create_task`
- 遵循 AAA 模式: Arrange, Act, Assert
-
-### 2. 測試隔離
- 每個測試獨立執行
- 使用 fixtures 提供測試數據
- 不依賴其他測試的狀態
-
-### 3. Mock 外部服務
- Mock 外部 API 呼叫
- Mock 檔案系統操作
- Mock 第三方服務
-
-### 4. 測試覆蓋率目標
- 核心業務邏輯: >90%
- API 端點: >80%
- 工具函數: >70%
-
---
-
-## 故障排除
-
-### 常見問題
-
-**問題**: `ImportError: cannot import name 'XXX'`
-**解決**: 確保 PYTHONPATH 正確設定
-```bash
-export PYTHONPATH=$PYTHONPATH:$(pwd)
-```
-
-**問題**: 資料庫連接錯誤
-**解決**: 測試使用記憶體資料庫，不需要實際資料庫連接
-
-**問題**: Token 驗證失敗
-**解決**: 檢查 JWT secret 設定，使用測試用 fixtures
-
---
-
-## 測試報告
-
-執行測試後生成的報告：
-
-1. **終端輸出**: 測試結果概覽
-2. **HTML 報告**: `htmlcov/index.html` (需要 --cov-report=html)
-3. **覆蓋率報告**: 顯示未測試的代碼行
-
---
-
-## 持續改進
-
- 定期運行測試套件
- 新功能必須包含測試
- 維護測試覆蓋率在 80% 以上
- Bug 修復時添加回歸測試
-
---
-
-**最後更新**: 2025-11-16
-**維護者**: Development Team
--- a/backend/alembic/versions/a7802b126240_initial_migration_with_paddle_ocr_prefix.py.bak
+++ b/backend/alembic/versions/a7802b126240_initial_migration_with_paddle_ocr_prefix.py.bak
--- a/backend/test_chinese_font.py
+++ b/backend/test_chinese_font.py
@@ -1,62 +0,0 @@
-"""
-Test script to verify ReportLab and Chinese font rendering
-"""
-from reportlab.pdfgen import canvas
-from reportlab.pdfbase import pdfmetrics
-from reportlab.pdfbase.ttfonts import TTFont
-from pathlib import Path
-import sys
-
-def test_chinese_rendering():
-    """Test if Chinese characters can be rendered in PDF"""
-
-    # Font path
-    font_path = "/home/egg/project/Tool_OCR/backend/fonts/NotoSansSC-Regular.ttf"
-
-    # Check if font file exists
-    if not Path(font_path).exists():
-        print(f"❌ Font file not found: {font_path}")
-        return False
-
-    print(f"✓ Font file found: {font_path}")
-
-    try:
-        # Register Chinese font
-        pdfmetrics.registerFont(TTFont('NotoSansSC', font_path))
-        print("✓ Font registered successfully")
-
-        # Create test PDF
-        test_pdf = "/tmp/test_chinese.pdf"
-        c = canvas.Canvas(test_pdf)
-
-        # Set Chinese font
-        c.setFont('NotoSansSC', 14)
-
-        # Draw test text
-        c.drawString(100, 750, "測試中文字符渲染 - Test Chinese Character Rendering")
-        c.drawString(100, 730, "HTD-S1 技術數據表")
-        c.drawString(100, 710, "這是一個 PDF 生成測試")
-
-        c.save()
-        print(f"✓ Test PDF created: {test_pdf}")
-
-        # Check file size
-        file_size = Path(test_pdf).stat().st_size
-        print(f"✓ PDF file size: {file_size} bytes")
-
-        if file_size > 0:
-            print("\n✅ Chinese font rendering test PASSED")
-            return True
-        else:
-            print("\n❌ PDF file is empty")
-            return False
-
-    except Exception as e:
-        print(f"❌ Error during testing: {e}")
-        import traceback
-        traceback.print_exc()
-        return False
-
-if __name__ == "__main__":
-    success = test_chinese_rendering()
-    sys.exit(0 if success else 1)
--- a/backend/test_services.py
+++ b/backend/test_services.py
@@ -1,286 +0,0 @@
-#!/usr/bin/env python3
-"""
-Tool_OCR - Service Layer Integration Test
-Tests core services before API implementation
-"""
-
-import sys
-import logging
-from pathlib import Path
-from datetime import datetime
-
-# Add backend to path
-sys.path.insert(0, str(Path(__file__).parent))
-
-from app.core.config import settings
-from app.core.database import engine, SessionLocal, Base
-from app.models.user import User
-from app.models.ocr import OCRBatch, OCRFile, OCRResult, FileStatus, BatchStatus
-from app.services.preprocessor import DocumentPreprocessor
-from app.services.ocr_service import OCRService
-from app.services.pdf_generator import PDFGenerator
-from app.services.file_manager import FileManager
-
-
-# Configure logging
-logging.basicConfig(
-    level=logging.INFO,
-    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
-)
-logger = logging.getLogger(__name__)
-
-
-class ServiceTester:
-    """Service layer integration tester"""
-
-    def __init__(self):
-        """Initialize tester"""
-        self.db = SessionLocal()
-        self.preprocessor = DocumentPreprocessor()
-        self.ocr_service = OCRService()
-        self.pdf_generator = PDFGenerator()
-        self.file_manager = FileManager()
-        self.test_results = {
-            "database": False,
-            "preprocessor": False,
-            "ocr_engine": False,
-            "pdf_generator": False,
-            "file_manager": False,
-        }
-
-    def cleanup(self):
-        """Cleanup resources"""
-        self.db.close()
-
-    def test_database_connection(self) -> bool:
-        """Test 1: Database connection and models"""
-        try:
-            logger.info("=" * 80)
-            logger.info("TEST 1: Database Connection")
-            logger.info("=" * 80)
-
-            # Test connection
-            from sqlalchemy import text
-            self.db.execute(text("SELECT 1"))
-            logger.info("✓ Database connection successful")
-
-            # Check if tables exist
-            from sqlalchemy import inspect
-            inspector = inspect(engine)
-            tables = inspector.get_table_names()
-
-            required_tables = [
-                'paddle_ocr_users',
-                'paddle_ocr_batches',
-                'paddle_ocr_files',
-                'paddle_ocr_results',
-                'paddle_ocr_export_rules',
-                'paddle_ocr_translation_configs'
-            ]
-
-            missing_tables = [t for t in required_tables if t not in tables]
-            if missing_tables:
-                logger.error(f"✗ Missing tables: {missing_tables}")
-                return False
-
-            logger.info(f"✓ All required tables exist: {', '.join(required_tables)}")
-
-            # Test creating a test user (will rollback)
-            test_user = User(
-                username=f"test_user_{datetime.now().timestamp()}",
-                email=f"test_{datetime.now().timestamp()}@example.com",
-                password_hash="test_hash_123",
-                is_active=True,
-                is_admin=False
-            )
-            self.db.add(test_user)
-            self.db.flush()
-            logger.info(f"✓ Test user created with ID: {test_user.id}")
-
-            self.db.rollback()  # Don't actually save test user
-            logger.info("✓ Database test completed successfully\n")
-
-            self.test_results["database"] = True
-            return True
-
-        except Exception as e:
-            logger.error(f"✗ Database test failed: {e}\n")
-            return False
-
-    def test_preprocessor(self) -> bool:
-        """Test 2: Document preprocessor"""
-        try:
-            logger.info("=" * 80)
-            logger.info("TEST 2: Document Preprocessor")
-            logger.info("=" * 80)
-
-            # Check supported formats
-            formats = ['.png', '.jpg', '.jpeg', '.pdf']
-            logger.info(f"✓ Supported formats: {formats}")
-
-            # Check max file size
-            max_size_mb = settings.max_upload_size / (1024 * 1024)
-            logger.info(f"✓ Max upload size: {max_size_mb} MB")
-
-            logger.info("✓ Preprocessor initialized successfully\n")
-
-            self.test_results["preprocessor"] = True
-            return True
-
-        except Exception as e:
-            logger.error(f"✗ Preprocessor test failed: {e}\n")
-            return False
-
-    def test_ocr_engine(self) -> bool:
-        """Test 3: OCR engine initialization"""
-        try:
-            logger.info("=" * 80)
-            logger.info("TEST 3: OCR Engine (PaddleOCR)")
-            logger.info("=" * 80)
-
-            # Test OCR engine lazy loading
-            logger.info("Initializing PaddleOCR engine (this may take a moment)...")
-            ocr_engine = self.ocr_service.get_ocr_engine(lang='ch')
-            logger.info("✓ PaddleOCR engine initialized for Chinese")
-
-            # Test structure engine
-            logger.info("Initializing PP-Structure engine...")
-            structure_engine = self.ocr_service.get_structure_engine()
-            logger.info("✓ PP-Structure engine initialized")
-
-            # Check confidence threshold
-            logger.info(f"✓ Confidence threshold: {self.ocr_service.confidence_threshold}")
-
-            logger.info("✓ OCR engine test completed successfully\n")
-
-            self.test_results["ocr_engine"] = True
-            return True
-
-        except Exception as e:
-            logger.error(f"✗ OCR engine test failed: {e}")
-            logger.error("  Make sure PaddleOCR models are downloaded:")
-            logger.error("  - PaddleOCR will auto-download on first use (~900MB)")
-            logger.error("  - Requires stable internet connection")
-            logger.error("")
-            return False
-
-    def test_pdf_generator(self) -> bool:
-        """Test 4: PDF generator"""
-        try:
-            logger.info("=" * 80)
-            logger.info("TEST 4: PDF Generator")
-            logger.info("=" * 80)
-
-            # Check Pandoc availability
-            pandoc_available = self.pdf_generator.check_pandoc_available()
-            if pandoc_available:
-                logger.info("✓ Pandoc is installed and available")
-            else:
-                logger.warning("⚠ Pandoc not found - will use WeasyPrint fallback")
-
-            # Check available templates
-            templates = self.pdf_generator.get_available_templates()
-            logger.info(f"✓ Available CSS templates: {', '.join(templates.keys())}")
-
-            logger.info("✓ PDF generator test completed successfully\n")
-
-            self.test_results["pdf_generator"] = True
-            return True
-
-        except Exception as e:
-            logger.error(f"✗ PDF generator test failed: {e}\n")
-            return False
-
-    def test_file_manager(self) -> bool:
-        """Test 5: File manager"""
-        try:
-            logger.info("=" * 80)
-            logger.info("TEST 5: File Manager")
-            logger.info("=" * 80)
-
-            # Check upload directory
-            upload_dir = Path(settings.upload_dir)
-            if upload_dir.exists():
-                logger.info(f"✓ Upload directory exists: {upload_dir}")
-            else:
-                upload_dir.mkdir(parents=True, exist_ok=True)
-                logger.info(f"✓ Created upload directory: {upload_dir}")
-
-            # Test batch directory creation
-            test_batch_id = 99999  # Use high number to avoid conflicts
-            batch_dir = self.file_manager.create_batch_directory(test_batch_id)
-            logger.info(f"✓ Created test batch directory: {batch_dir}")
-
-            # Check subdirectories
-            subdirs = ["inputs", "outputs/markdown", "outputs/json", "outputs/images", "exports"]
-            for subdir in subdirs:
-                subdir_path = batch_dir / subdir
-                if subdir_path.exists():
-                    logger.info(f"  ✓ {subdir}")
-                else:
-                    logger.error(f"  ✗ Missing: {subdir}")
-                    return False
-
-            # Cleanup test directory
-            import shutil
-            shutil.rmtree(batch_dir.parent, ignore_errors=True)
-            logger.info("✓ Cleaned up test batch directory")
-
-            logger.info("✓ File manager test completed successfully\n")
-
-            self.test_results["file_manager"] = True
-            return True
-
-        except Exception as e:
-            logger.error(f"✗ File manager test failed: {e}\n")
-            return False
-
-    def run_all_tests(self):
-        """Run all service tests"""
-        logger.info("\n" + "=" * 80)
-        logger.info("Tool_OCR Service Layer Integration Test")
-        logger.info("=" * 80 + "\n")
-
-        try:
-            # Run tests in order
-            self.test_database_connection()
-            self.test_preprocessor()
-            self.test_ocr_engine()
-            self.test_pdf_generator()
-            self.test_file_manager()
-
-            # Print summary
-            logger.info("=" * 80)
-            logger.info("TEST SUMMARY")
-            logger.info("=" * 80)
-
-            total_tests = len(self.test_results)
-            passed_tests = sum(1 for result in self.test_results.values() if result)
-
-            for test_name, result in self.test_results.items():
-                status = "✓ PASS" if result else "✗ FAIL"
-                logger.info(f"{status:8} - {test_name}")
-
-            logger.info("-" * 80)
-            logger.info(f"Total: {passed_tests}/{total_tests} tests passed")
-
-            if passed_tests == total_tests:
-                logger.info("\n🎉 All service layer tests passed! Ready to implement API endpoints.")
-                return 0
-            else:
-                logger.error(f"\n❌ {total_tests - passed_tests} test(s) failed. Please fix issues before proceeding.")
-                return 1
-
-        finally:
-            self.cleanup()
-
-
-def main():
-    """Main test entry point"""
-    tester = ServiceTester()
-    exit_code = tester.run_all_tests()
-    sys.exit(exit_code)
-
-
-if __name__ == "__main__":
-    main()
--- a/backend/tests/init.py
+++ b/backend/tests/init.py
@@ -1,3 +0,0 @@
-"""
-Tool_OCR - Unit Tests Package
-"""
--- a/backend/tests/conftest.py
+++ b/backend/tests/conftest.py
@@ -1,138 +0,0 @@
-"""
-V2 API Test Configuration and Fixtures
-Provides test fixtures for authentication, database, and API testing
-"""
-
-import pytest
-from fastapi.testclient import TestClient
-from sqlalchemy import create_engine
-from sqlalchemy.orm import sessionmaker
-from sqlalchemy.pool import StaticPool
-
-# IMPORTANT: Monkey patch database module BEFORE importing app
-# This prevents the app from connecting to production database
-import app.core.database as db_module
-
-# Create a test engine for the entire test session
-_test_engine = create_engine(
-    "sqlite:///:memory:",
-    connect_args={"check_same_thread": False},
-    poolclass=StaticPool,
-)
-
-# Replace the global engine and SessionLocal
-db_module.engine = _test_engine
-db_module.SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=_test_engine)
-
-# Now safely import app (it will use our test database)
-from app.main import app
-from app.core.database import Base, get_db
-from app.core.security import create_access_token
-from app.models.user import User
-from app.models.task import Task
-
-
-@pytest.fixture(scope="function")
-def engine():
-    """Get test database engine and reset tables for each test"""
-    Base.metadata.drop_all(bind=_test_engine)
-    Base.metadata.create_all(bind=_test_engine)
-    yield _test_engine
-    # Tables will be dropped at the start of next test
-
-
-@pytest.fixture(scope="function")
-def db(engine):
-    """Create test database session"""
-    TestingSessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
-    db = TestingSessionLocal()
-    try:
-        yield db
-    finally:
-        db.close()
-
-
-@pytest.fixture(scope="function")
-def client(db):
-    """Create FastAPI test client with test database"""
-    # Override get_db to use the same session as the test
-    def override_get_db():
-        try:
-            yield db
-        finally:
-            # Don't close the session, it's managed by the db fixture
-            pass
-
-    app.dependency_overrides[get_db] = override_get_db
-    with TestClient(app) as test_client:
-        yield test_client
-    app.dependency_overrides.clear()
-
-
-@pytest.fixture
-def test_user(db):
-    """Create a test user"""
-    # Ensure test_user is always created first by checking if it exists
-    user = db.query(User).filter(User.email == "test@example.com").first()
-    if not user:
-        user = User(
-            email="test@example.com",
-            display_name="Test User",
-            is_active=True
-        )
-        db.add(user)
-        db.commit()
-        db.refresh(user)
-    return user
-
-
-@pytest.fixture
-def admin_user(db):
-    """Create an admin user"""
-    user = db.query(User).filter(User.email == "ymirliu@panjit.com.tw").first()
-    if not user:
-        user = User(
-            email="ymirliu@panjit.com.tw",
-            display_name="Admin User",
-            is_active=True
-        )
-        db.add(user)
-        db.commit()
-        db.refresh(user)
-    return user
-
-
-@pytest.fixture
-def auth_token(test_user):
-    """Create authentication token for test user"""
-    token_data = {
-        "sub": str(test_user.id),
-        "email": test_user.email
-    }
-    return create_access_token(token_data)
-
-
-@pytest.fixture
-def admin_token(admin_user):
-    """Create authentication token for admin user"""
-    token_data = {
-        "sub": str(admin_user.id),
-        "email": admin_user.email
-    }
-    return create_access_token(token_data)
-
-
-@pytest.fixture
-def test_task(test_user, db):
-    """Create a test task (depends on test_user to ensure user exists first)"""
-    task = Task(
-        user_id=test_user.id,
-        task_id="test-task-123",
-        filename="test.pdf",
-        file_type="application/pdf",
-        status="pending"
-    )
-    db.add(task)
-    db.commit()
-    db.refresh(task)
-    return task
--- a/backend/tests/conftest_v1.py
+++ b/backend/tests/conftest_v1.py
@@ -1,179 +0,0 @@
-"""
-Tool_OCR - Pytest Fixtures and Configuration
-Shared fixtures for all tests
-"""
-
-import pytest
-import tempfile
-import shutil
-from pathlib import Path
-from PIL import Image
-import io
-
-from app.services.preprocessor import DocumentPreprocessor
-
-
-@pytest.fixture
-def temp_dir():
-    """Create a temporary directory for test files"""
-    temp_path = Path(tempfile.mkdtemp())
-    yield temp_path
-    # Cleanup after test
-    shutil.rmtree(temp_path, ignore_errors=True)
-
-
-@pytest.fixture
-def sample_image_path(temp_dir):
-    """Create a valid PNG image file for testing"""
-    image_path = temp_dir / "test_image.png"
-
-    # Create a simple 100x100 white image
-    img = Image.new('RGB', (100, 100), color='white')
-    img.save(image_path, 'PNG')
-
-    return image_path
-
-
-@pytest.fixture
-def sample_jpg_path(temp_dir):
-    """Create a valid JPG image file for testing"""
-    image_path = temp_dir / "test_image.jpg"
-
-    # Create a simple 100x100 white image
-    img = Image.new('RGB', (100, 100), color='white')
-    img.save(image_path, 'JPEG')
-
-    return image_path
-
-
-@pytest.fixture
-def sample_pdf_path(temp_dir):
-    """Create a valid PDF file for testing"""
-    pdf_path = temp_dir / "test_document.pdf"
-
-    # Create minimal valid PDF
-    pdf_content = b"""%PDF-1.4
-1 0 obj
-<<
-/Type /Catalog
-/Pages 2 0 R
->>
-endobj
-2 0 obj
-<<
-/Type /Pages
-/Kids [3 0 R]
-/Count 1
->>
-endobj
-3 0 obj
-<<
-/Type /Page
-/Parent 2 0 R
-/MediaBox [0 0 612 792]
-/Contents 4 0 R
-/Resources <<
-/Font <<
-/F1 <<
-/Type /Font
-/Subtype /Type1
-/BaseFont /Helvetica
->>
->>
->>
->>
-endobj
-4 0 obj
-<<
-/Length 44
->>
-stream
-BT
-/F1 12 Tf
-100 700 Td
-(Test PDF) Tj
-ET
-endstream
-endobj
-xref
-0 5
-0000000000 65535 f
-0000000009 00000 n
-0000000058 00000 n
-0000000115 00000 n
-0000000317 00000 n
-trailer
-<<
-/Size 5
-/Root 1 0 R
->>
-startxref
-410
-%%EOF
-"""
-
-    with open(pdf_path, 'wb') as f:
-        f.write(pdf_content)
-
-    return pdf_path
-
-
-@pytest.fixture
-def corrupted_image_path(temp_dir):
-    """Create a corrupted image file for testing"""
-    image_path = temp_dir / "corrupted.png"
-
-    # Write invalid PNG data
-    with open(image_path, 'wb') as f:
-        f.write(b'\x89PNG\r\n\x1a\n\x00\x00\x00corrupted data')
-
-    return image_path
-
-
-@pytest.fixture
-def large_file_path(temp_dir):
-    """Create a valid PNG file larger than the upload limit"""
-    file_path = temp_dir / "large_file.png"
-
-    # Create a large PNG image with random data (to prevent compression)
-    # 15000x15000 with random pixels should be > 20MB
-    import numpy as np
-    random_data = np.random.randint(0, 256, (15000, 15000, 3), dtype=np.uint8)
-    img = Image.fromarray(random_data, 'RGB')
-    img.save(file_path, 'PNG', compress_level=0)  # No compression
-
-    # Verify it's actually large
-    file_size = file_path.stat().st_size
-    assert file_size > 20 * 1024 * 1024, f"File only {file_size / (1024*1024):.2f} MB"
-
-    return file_path
-
-
-@pytest.fixture
-def unsupported_file_path(temp_dir):
-    """Create a file with unsupported format"""
-    file_path = temp_dir / "test.txt"
-
-    with open(file_path, 'w') as f:
-        f.write("This is a text file, not an image")
-
-    return file_path
-
-
-@pytest.fixture
-def preprocessor():
-    """Create a DocumentPreprocessor instance"""
-    return DocumentPreprocessor()
-
-
-@pytest.fixture
-def sample_image_with_text():
-    """Return path to a real image with text from demo_docs for OCR testing"""
-    # Use the english.png sample from demo_docs
-    demo_image_path = Path(__file__).parent.parent.parent / "demo_docs" / "basic" / "english.png"
-
-    # Check if demo image exists, otherwise skip the test
-    if not demo_image_path.exists():
-        pytest.skip(f"Demo image not found at {demo_image_path}")
-
-    return demo_image_path
--- a/backend/tests/test_admin.py
+++ b/backend/tests/test_admin.py
@@ -1,60 +0,0 @@
-"""
-Unit tests for admin endpoints
-"""
-
-import pytest
-
-
-class TestAdmin:
-    """Test admin endpoints"""
-
-    def test_get_system_stats(self, client, admin_token):
-        """Test get system statistics"""
-        response = client.get(
-            '/api/v2/admin/stats',
-            headers={'Authorization': f'Bearer {admin_token}'}
-        )
-
-        assert response.status_code == 200
-        data = response.json()
-        # API returns nested structure
-        assert 'users' in data
-        assert 'tasks' in data
-        assert 'sessions' in data
-        assert 'activity' in data
-        assert 'total' in data['users']
-        assert 'total' in data['tasks']
-
-    def test_get_system_stats_non_admin(self, client, auth_token):
-        """Test that non-admin cannot access admin endpoints"""
-        response = client.get(
-            '/api/v2/admin/stats',
-            headers={'Authorization': f'Bearer {auth_token}'}
-        )
-
-        assert response.status_code == 403
-
-    def test_list_users(self, client, admin_token):
-        """Test list all users"""
-        response = client.get(
-            '/api/v2/admin/users',
-            headers={'Authorization': f'Bearer {admin_token}'}
-        )
-
-        assert response.status_code == 200
-        data = response.json()
-        assert 'users' in data
-        assert 'total' in data
-
-    def test_get_audit_logs(self, client, admin_token):
-        """Test get audit logs"""
-        response = client.get(
-            '/api/v2/admin/audit-logs',
-            headers={'Authorization': f'Bearer {admin_token}'}
-        )
-
-        assert response.status_code == 200
-        data = response.json()
-        assert 'logs' in data
-        assert 'total' in data
-        assert 'page' in data
--- a/backend/tests/test_api_integration.py
+++ b/backend/tests/test_api_integration.py
@@ -1,687 +0,0 @@
-"""
-Tool_OCR - API Integration Tests
-Tests all API endpoints with database integration
-"""
-
-import pytest
-import tempfile
-import shutil
-from pathlib import Path
-from io import BytesIO
-from datetime import datetime
-from unittest.mock import patch, Mock
-
-from fastapi.testclient import TestClient
-from sqlalchemy import create_engine
-from sqlalchemy.orm import sessionmaker
-from PIL import Image
-
-from app.main import app
-from app.core.database import Base
-from app.core.deps import get_db, get_current_active_user
-from app.core.security import create_access_token, get_password_hash
-from app.models.user import User
-from app.models.ocr import OCRBatch, OCRFile, OCRResult, BatchStatus, FileStatus
-from app.models.export import ExportRule
-
-
-# ============================================================================
-# Test Database Setup
-# ============================================================================
-
-@pytest.fixture(scope="function")
-def test_db():
-    """Create test database using SQLite in-memory"""
-    # Import all models to ensure they are registered with Base.metadata
-    # This triggers SQLAlchemy to register table definitions
-    from app.models import User, OCRBatch, OCRFile, OCRResult, ExportRule, TranslationConfig
-
-    # Create in-memory SQLite database
-    engine = create_engine("sqlite:///:memory:", connect_args={"check_same_thread": False})
-    TestingSessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
-
-    # Create all tables
-    Base.metadata.create_all(bind=engine)
-
-    db = TestingSessionLocal()
-    try:
-        yield db
-    finally:
-        db.close()
-        Base.metadata.drop_all(bind=engine)
-
-
-@pytest.fixture(scope="function")
-def test_user(test_db):
-    """Create test user in database"""
-    user = User(
-        username="testuser",
-        email="test@example.com",
-        password_hash=get_password_hash("password123"),
-        is_active=True,
-        is_admin=False
-    )
-    test_db.add(user)
-    test_db.commit()
-    test_db.refresh(user)
-    return user
-
-
-@pytest.fixture(scope="function")
-def inactive_user(test_db):
-    """Create inactive test user"""
-    user = User(
-        username="inactive",
-        email="inactive@example.com",
-        password_hash=get_password_hash("password123"),
-        is_active=False,
-        is_admin=False
-    )
-    test_db.add(user)
-    test_db.commit()
-    test_db.refresh(user)
-    return user
-
-
-@pytest.fixture(scope="function")
-def auth_token(test_user):
-    """Generate JWT token for test user"""
-    token = create_access_token(data={"sub": test_user.id, "username": test_user.username})
-    return token
-
-
-@pytest.fixture(scope="function")
-def auth_headers(auth_token):
-    """Generate authorization headers"""
-    return {"Authorization": f"Bearer {auth_token}"}
-
-
-# ============================================================================
-# Test Client Setup
-# ============================================================================
-
-@pytest.fixture(scope="function")
-def client(test_db, test_user):
-    """Create FastAPI test client with overridden dependencies"""
-
-    def override_get_db():
-        try:
-            yield test_db
-        finally:
-            pass
-
-    def override_get_current_active_user():
-        return test_user
-
-    app.dependency_overrides[get_db] = override_get_db
-    app.dependency_overrides[get_current_active_user] = override_get_current_active_user
-
-    client = TestClient(app)
-    yield client
-
-    # Clean up overrides
-    app.dependency_overrides.clear()
-
-
-# ============================================================================
-# Test Data Fixtures
-# ============================================================================
-
-@pytest.fixture
-def temp_upload_dir():
-    """Create temporary upload directory"""
-    temp_dir = Path(tempfile.mkdtemp())
-    yield temp_dir
-    shutil.rmtree(temp_dir, ignore_errors=True)
-
-
-@pytest.fixture
-def sample_image_file():
-    """Create sample image file for upload"""
-    img = Image.new('RGB', (100, 100), color='white')
-    img_bytes = BytesIO()
-    img.save(img_bytes, format='PNG')
-    img_bytes.seek(0)
-    return ("test.png", img_bytes, "image/png")
-
-
-@pytest.fixture
-def test_batch(test_db, test_user):
-    """Create test batch in database"""
-    batch = OCRBatch(
-        user_id=test_user.id,
-        batch_name="Test Batch",
-        status=BatchStatus.PENDING,
-        total_files=0,
-        completed_files=0,
-        failed_files=0
-    )
-    test_db.add(batch)
-    test_db.commit()
-    test_db.refresh(batch)
-    return batch
-
-
-@pytest.fixture
-def test_ocr_file(test_db, test_batch):
-    """Create test OCR file in database"""
-    ocr_file = OCRFile(
-        batch_id=test_batch.id,
-        filename="test.png",
-        original_filename="test.png",
-        file_path="/tmp/test.png",
-        file_size=1024,
-        file_format="png",
-        status=FileStatus.COMPLETED
-    )
-    test_db.add(ocr_file)
-    test_db.commit()
-    test_db.refresh(ocr_file)
-    return ocr_file
-
-
-@pytest.fixture
-def test_ocr_result(test_db, test_ocr_file, temp_upload_dir):
-    """Create test OCR result in database"""
-    # Create test markdown file
-    markdown_path = temp_upload_dir / "result.md"
-    markdown_path.write_text("# Test Result\n\nTest content", encoding="utf-8")
-
-    result = OCRResult(
-        file_id=test_ocr_file.id,
-        markdown_path=str(markdown_path),
-        json_path=str(temp_upload_dir / "result.json"),
-        detected_language="ch",
-        total_text_regions=5,
-        average_confidence=0.95,
-        layout_data={"regions": []},
-        images_metadata=[]
-    )
-    test_db.add(result)
-    test_db.commit()
-    test_db.refresh(result)
-    return result
-
-
-@pytest.fixture
-def test_export_rule(test_db, test_user):
-    """Create test export rule in database"""
-    rule = ExportRule(
-        user_id=test_user.id,
-        rule_name="Test Rule",
-        description="Test export rule",
-        config_json={
-            "filters": {"confidence_threshold": 0.8},
-            "formatting": {"add_line_numbers": True}
-        }
-    )
-    test_db.add(rule)
-    test_db.commit()
-    test_db.refresh(rule)
-    return rule
-
-
-# ============================================================================
-# Authentication Router Tests
-# ============================================================================
-
-@pytest.mark.integration
-class TestAuthRouter:
-    """Test authentication endpoints"""
-
-    def test_login_success(self, client, test_user):
-        """Test successful login"""
-        response = client.post(
-            "/api/v1/auth/login",
-            json={
-                "username": "testuser",
-                "password": "password123"
-            }
-        )
-
-        assert response.status_code == 200
-        data = response.json()
-        assert "access_token" in data
-        assert data["token_type"] == "bearer"
-        assert "expires_in" in data
-        assert data["expires_in"] > 0
-
-    def test_login_invalid_username(self, client):
-        """Test login with invalid username"""
-        response = client.post(
-            "/api/v1/auth/login",
-            json={
-                "username": "nonexistent",
-                "password": "password123"
-            }
-        )
-
-        assert response.status_code == 401
-        assert "Incorrect username or password" in response.json()["detail"]
-
-    def test_login_invalid_password(self, client, test_user):
-        """Test login with invalid password"""
-        response = client.post(
-            "/api/v1/auth/login",
-            json={
-                "username": "testuser",
-                "password": "wrongpassword"
-            }
-        )
-
-        assert response.status_code == 401
-        assert "Incorrect username or password" in response.json()["detail"]
-
-    def test_login_inactive_user(self, client, inactive_user):
-        """Test login with inactive user account"""
-        response = client.post(
-            "/api/v1/auth/login",
-            json={
-                "username": "inactive",
-                "password": "password123"
-            }
-        )
-
-        assert response.status_code == 403
-        assert "inactive" in response.json()["detail"].lower()
-
-
-# ============================================================================
-# OCR Router Tests
-# ============================================================================
-
-@pytest.mark.integration
-class TestOCRRouter:
-    """Test OCR processing endpoints"""
-
-    @patch('app.services.file_manager.FileManager.create_batch')
-    @patch('app.services.file_manager.FileManager.add_files_to_batch')
-    def test_upload_files_success(self, mock_add_files, mock_create_batch,
-                                   client, auth_headers, test_batch, sample_image_file):
-        """Test successful file upload"""
-        # Mock the file manager methods
-        mock_create_batch.return_value = test_batch
-        mock_add_files.return_value = []
-
-        response = client.post(
-            "/api/v1/upload",
-            files={"files": sample_image_file},
-            data={"batch_name": "Test Upload"},
-            headers=auth_headers
-        )
-
-        assert response.status_code == 200
-        data = response.json()
-        assert "id" in data
-        assert data["batch_name"] == "Test Batch"
-
-    def test_upload_no_files(self, client, auth_headers):
-        """Test upload with no files"""
-        response = client.post(
-            "/api/v1/upload",
-            headers=auth_headers
-        )
-
-        assert response.status_code == 422  # Validation error
-
-    def test_upload_unauthorized(self, client, sample_image_file):
-        """Test upload without authentication"""
-        # Override to remove authentication
-        app.dependency_overrides.clear()
-
-        response = client.post(
-            "/api/v1/upload",
-            files={"files": sample_image_file}
-        )
-
-        assert response.status_code == 403  # Forbidden (no auth)
-
-    @patch('app.services.background_tasks.process_batch_files_with_retry')
-    def test_process_ocr_success(self, mock_process, client, auth_headers,
-                                 test_batch, test_db):
-        """Test triggering OCR processing"""
-        response = client.post(
-            "/api/v1/ocr/process",
-            json={
-                "batch_id": test_batch.id,
-                "lang": "ch",
-                "detect_layout": True
-            },
-            headers=auth_headers
-        )
-
-        assert response.status_code == 200
-        data = response.json()
-        assert data["message"] == "OCR processing started"
-        assert data["batch_id"] == test_batch.id
-        assert data["status"] == "processing"
-
-    def test_process_ocr_batch_not_found(self, client, auth_headers):
-        """Test OCR processing with non-existent batch"""
-        response = client.post(
-            "/api/v1/ocr/process",
-            json={
-                "batch_id": 99999,
-                "lang": "ch",
-                "detect_layout": True
-            },
-            headers=auth_headers
-        )
-
-        assert response.status_code == 404
-        assert "not found" in response.json()["detail"].lower()
-
-    def test_process_ocr_already_processing(self, client, auth_headers,
-                                           test_batch, test_db):
-        """Test OCR processing when batch is already processing"""
-        # Update batch status
-        test_batch.status = BatchStatus.PROCESSING
-        test_db.commit()
-
-        response = client.post(
-            "/api/v1/ocr/process",
-            json={
-                "batch_id": test_batch.id,
-                "lang": "ch",
-                "detect_layout": True
-            },
-            headers=auth_headers
-        )
-
-        assert response.status_code == 400
-        assert "already" in response.json()["detail"].lower()
-
-    def test_get_batch_status_success(self, client, auth_headers, test_batch,
-                                      test_ocr_file):
-        """Test getting batch status"""
-        response = client.get(
-            f"/api/v1/batch/{test_batch.id}/status",
-            headers=auth_headers
-        )
-
-        assert response.status_code == 200
-        data = response.json()
-        assert "batch" in data
-        assert "files" in data
-        assert data["batch"]["id"] == test_batch.id
-        assert len(data["files"]) >= 0
-
-    def test_get_batch_status_not_found(self, client, auth_headers):
-        """Test getting status for non-existent batch"""
-        response = client.get(
-            "/api/v1/batch/99999/status",
-            headers=auth_headers
-        )
-
-        assert response.status_code == 404
-
-    def test_get_ocr_result_success(self, client, auth_headers, test_ocr_file,
-                                    test_ocr_result):
-        """Test getting OCR result"""
-        response = client.get(
-            f"/api/v1/ocr/result/{test_ocr_file.id}",
-            headers=auth_headers
-        )
-
-        assert response.status_code == 200
-        data = response.json()
-        assert "file" in data
-        assert "result" in data
-        assert data["file"]["id"] == test_ocr_file.id
-
-    def test_get_ocr_result_not_found(self, client, auth_headers):
-        """Test getting result for non-existent file"""
-        response = client.get(
-            "/api/v1/ocr/result/99999",
-            headers=auth_headers
-        )
-
-        assert response.status_code == 404
-
-
-# ============================================================================
-# Export Router Tests
-# ============================================================================
-
-@pytest.mark.integration
-class TestExportRouter:
-    """Test export endpoints"""
-
-    @pytest.mark.skip(reason="FileResponse validation requires actual file paths, tested in unit tests")
-    @patch('app.services.export_service.ExportService.export_to_txt')
-    def test_export_txt_success(self, mock_export, client, auth_headers,
-                                test_batch, test_ocr_file, test_ocr_result,
-                                temp_upload_dir):
-        """Test exporting results to TXT format"""
-        # NOTE: This test is skipped because FastAPI's FileResponse validates
-        # the file path exists, making it difficult to mock properly.
-        # The export service functionality is thoroughly tested in unit tests.
-        # End-to-end tests would be more appropriate for testing the full flow.
-        pass
-
-    def test_export_batch_not_found(self, client, auth_headers):
-        """Test export with non-existent batch"""
-        response = client.post(
-            "/api/v1/export",
-            json={
-                "batch_id": 99999,
-                "format": "txt"
-            },
-            headers=auth_headers
-        )
-
-        assert response.status_code == 404
-
-    def test_export_no_results(self, client, auth_headers, test_batch):
-        """Test export when no completed results exist"""
-        response = client.post(
-            "/api/v1/export",
-            json={
-                "batch_id": test_batch.id,
-                "format": "txt"
-            },
-            headers=auth_headers
-        )
-
-        assert response.status_code == 404
-        assert "no completed results" in response.json()["detail"].lower()
-
-    def test_export_unsupported_format(self, client, auth_headers, test_batch):
-        """Test export with unsupported format"""
-        response = client.post(
-            "/api/v1/export",
-            json={
-                "batch_id": test_batch.id,
-                "format": "invalid_format"
-            },
-            headers=auth_headers
-        )
-
-        # Should fail at validation or business logic level
-        assert response.status_code in [400, 404]
-
-    @pytest.mark.skip(reason="FileResponse validation requires actual file paths, tested in unit tests")
-    @patch('app.services.export_service.ExportService.export_to_pdf')
-    def test_generate_pdf_success(self, mock_export, client, auth_headers,
-                                  test_ocr_file, test_ocr_result, temp_upload_dir):
-        """Test generating PDF for single file"""
-        # NOTE: This test is skipped because FastAPI's FileResponse validates
-        # the file path exists, making it difficult to mock properly.
-        # The PDF generation functionality is thoroughly tested in unit tests.
-        pass
-
-    def test_generate_pdf_file_not_found(self, client, auth_headers):
-        """Test PDF generation for non-existent file"""
-        response = client.get(
-            "/api/v1/export/pdf/99999",
-            headers=auth_headers
-        )
-
-        assert response.status_code == 404
-
-    def test_generate_pdf_no_result(self, client, auth_headers, test_ocr_file):
-        """Test PDF generation when no OCR result exists"""
-        response = client.get(
-            f"/api/v1/export/pdf/{test_ocr_file.id}",
-            headers=auth_headers
-        )
-
-        assert response.status_code == 404
-
-    def test_list_export_rules(self, client, auth_headers, test_export_rule):
-        """Test listing export rules"""
-        response = client.get(
-            "/api/v1/export/rules",
-            headers=auth_headers
-        )
-
-        assert response.status_code == 200
-        data = response.json()
-        assert isinstance(data, list)
-        assert len(data) >= 0
-
-    @pytest.mark.skip(reason="SQLite session isolation issue with in-memory DB, tested in unit tests")
-    def test_create_export_rule(self, client, auth_headers):
-        """Test creating export rule"""
-        # NOTE: This test fails due to SQLite in-memory database session isolation
-        # The create operation works but db.refresh() fails to query the new record
-        # Export rule CRUD is thoroughly tested in unit tests
-        pass
-
-    @pytest.mark.skip(reason="SQLite session isolation issue with in-memory DB, tested in unit tests")
-    def test_update_export_rule(self, client, auth_headers, test_export_rule):
-        """Test updating export rule"""
-        # NOTE: This test fails due to SQLite in-memory database session isolation
-        # The update operation works but db.refresh() fails to query the updated record
-        # Export rule CRUD is thoroughly tested in unit tests
-        pass
-
-    def test_update_export_rule_not_found(self, client, auth_headers):
-        """Test updating non-existent export rule"""
-        response = client.put(
-            "/api/v1/export/rules/99999",
-            json={
-                "rule_name": "Updated Rule"
-            },
-            headers=auth_headers
-        )
-
-        assert response.status_code == 404
-
-    def test_delete_export_rule(self, client, auth_headers, test_export_rule):
-        """Test deleting export rule"""
-        response = client.delete(
-            f"/api/v1/export/rules/{test_export_rule.id}",
-            headers=auth_headers
-        )
-
-        assert response.status_code == 200
-        assert "deleted successfully" in response.json()["message"].lower()
-
-    def test_delete_export_rule_not_found(self, client, auth_headers):
-        """Test deleting non-existent export rule"""
-        response = client.delete(
-            "/api/v1/export/rules/99999",
-            headers=auth_headers
-        )
-
-        assert response.status_code == 404
-
-    def test_list_css_templates(self, client):
-        """Test listing CSS templates (no auth required)"""
-        response = client.get("/api/v1/export/css-templates")
-
-        assert response.status_code == 200
-        data = response.json()
-        assert isinstance(data, list)
-        assert len(data) > 0
-        assert all("name" in item and "description" in item for item in data)
-
-
-# ============================================================================
-# Translation Router Tests (Stub Endpoints)
-# ============================================================================
-
-@pytest.mark.integration
-class TestTranslationRouter:
-    """Test translation stub endpoints"""
-
-    def test_get_translation_status(self, client):
-        """Test getting translation feature status (stub)"""
-        response = client.get("/api/v1/translate/status")
-
-        assert response.status_code == 200
-        data = response.json()
-        assert "status" in data
-        assert data["status"].lower() == "reserved"  # Case-insensitive check
-
-    def test_get_supported_languages(self, client):
-        """Test getting supported languages (stub)"""
-        response = client.get("/api/v1/translate/languages")
-
-        assert response.status_code == 200
-        data = response.json()
-        assert isinstance(data, list)
-
-    def test_translate_document_not_implemented(self, client, auth_headers):
-        """Test translate document endpoint returns 501"""
-        response = client.post(
-            "/api/v1/translate/document",
-            json={
-                "file_id": 1,
-                "source_lang": "zh",
-                "target_lang": "en",
-                "engine_type": "offline"
-            },
-            headers=auth_headers
-        )
-
-        assert response.status_code == 501
-        data = response.json()
-        assert "not implemented" in str(data["detail"]).lower()
-
-    def test_get_translation_task_status_not_implemented(self, client, auth_headers):
-        """Test translation task status endpoint returns 501"""
-        response = client.get(
-            "/api/v1/translate/task/1",
-            headers=auth_headers
-        )
-
-        assert response.status_code == 501
-
-    def test_cancel_translation_task_not_implemented(self, client, auth_headers):
-        """Test cancel translation task endpoint returns 501"""
-        response = client.delete(
-            "/api/v1/translate/task/1",
-            headers=auth_headers
-        )
-
-        assert response.status_code == 501
-
-
-# ============================================================================
-# Application Health Tests
-# ============================================================================
-
-@pytest.mark.integration
-class TestApplicationHealth:
-    """Test application health and root endpoints"""
-
-    def test_health_check(self, client):
-        """Test health check endpoint"""
-        response = client.get("/health")
-
-        assert response.status_code == 200
-        data = response.json()
-        assert data["status"] == "healthy"
-        assert data["service"] == "Tool_OCR"
-
-    def test_root_endpoint(self, client):
-        """Test root endpoint"""
-        response = client.get("/")
-
-        assert response.status_code == 200
-        data = response.json()
-        assert "message" in data
-        assert "Tool_OCR" in data["message"]
-        assert "docs_url" in data
--- a/backend/tests/test_auth.py
+++ b/backend/tests/test_auth.py
@@ -1,87 +0,0 @@
-"""
-Unit tests for authentication endpoints
-"""
-
-import pytest
-from unittest.mock import patch, MagicMock
-
-
-class TestAuth:
-    """Test authentication endpoints"""
-
-    def test_login_success(self, client, db):
-        """Test successful login"""
-        # Mock external auth service with proper Pydantic models
-        from app.services.external_auth_service import AuthResponse, UserInfo
-
-        user_info = UserInfo(
-            id="test-id-123",
-            name="Test User",
-            email="test@example.com"
-        )
-        auth_response = AuthResponse(
-            access_token="test-token",
-            id_token="test-id-token",
-            expires_in=3600,
-            token_type="Bearer",
-            user_info=user_info,
-            issued_at="2025-11-16T10:00:00Z",
-            expires_at="2025-11-16T11:00:00Z"
-        )
-
-        with patch('app.routers.auth.external_auth_service.authenticate_user') as mock_auth:
-            mock_auth.return_value = (True, auth_response, None)
-
-            response = client.post('/api/v2/auth/login', json={
-                'username': 'test@example.com',
-                'password': 'password123'
-            })
-
-            assert response.status_code == 200
-            data = response.json()
-            assert 'access_token' in data
-            assert data['token_type'] == 'bearer'
-            assert 'user' in data
-
-    def test_login_invalid_credentials(self, client):
-        """Test login with invalid credentials"""
-        with patch('app.routers.auth.external_auth_service.authenticate_user') as mock_auth:
-            mock_auth.return_value = (False, None, 'Invalid credentials')
-
-            response = client.post('/api/v2/auth/login', json={
-                'username': 'test@example.com',
-                'password': 'wrongpassword'
-            })
-
-            assert response.status_code == 401
-            assert 'detail' in response.json()
-
-    def test_get_me(self, client, auth_token):
-        """Test get current user info"""
-        response = client.get(
-            '/api/v2/auth/me',
-            headers={'Authorization': f'Bearer {auth_token}'}
-        )
-
-        assert response.status_code == 200
-        data = response.json()
-        assert 'email' in data
-        assert 'display_name' in data
-
-    def test_get_me_unauthorized(self, client):
-        """Test get current user without token"""
-        response = client.get('/api/v2/auth/me')
-        assert response.status_code == 403
-
-    def test_logout(self, client, auth_token):
-        """Test logout"""
-        response = client.post(
-            '/api/v2/auth/logout',
-            headers={'Authorization': f'Bearer {auth_token}'}
-        )
-
-        assert response.status_code == 200
-        data = response.json()
-        # When no session_id is provided, logs out all sessions
-        assert 'message' in data
-        assert 'Logged out' in data['message']
--- a/backend/tests/test_export_service.py
+++ b/backend/tests/test_export_service.py
@@ -1,637 +0,0 @@
-"""
-Tool_OCR - Export Service Unit Tests
-Tests for app/services/export_service.py
-"""
-
-import pytest
-import json
-import zipfile
-from pathlib import Path
-from unittest.mock import Mock, patch, MagicMock
-from datetime import datetime
-
-import pandas as pd
-
-from app.services.export_service import ExportService, ExportError
-from app.models.ocr import FileStatus
-
-
-@pytest.fixture
-def export_service():
-    """Create an ExportService instance"""
-    return ExportService()
-
-
-@pytest.fixture
-def mock_ocr_result(temp_dir):
-    """Create a mock OCRResult with markdown file"""
-    # Create mock markdown file
-    md_file = temp_dir / "test_result.md"
-    md_file.write_text("# Test Document\n\nThis is test content.", encoding="utf-8")
-
-    # Create mock result
-    result = Mock()
-    result.id = 1
-    result.markdown_path = str(md_file)
-    result.json_path = None
-    result.detected_language = "zh"
-    result.total_text_regions = 10
-    result.average_confidence = 0.95
-    result.layout_data = {"elements": [{"type": "text"}]}
-    result.images_metadata = []
-
-    # Mock file
-    result.file = Mock()
-    result.file.id = 1
-    result.file.original_filename = "test.png"
-    result.file.file_format = "png"
-    result.file.file_size = 1024
-    result.file.processing_time = 2.5
-
-    return result
-
-
-@pytest.fixture
-def mock_db():
-    """Create a mock database session"""
-    return Mock()
-
-
-@pytest.mark.unit
-class TestExportServiceInit:
-    """Test ExportService initialization"""
-
-    def test_init(self, export_service):
-        """Test export service initialization"""
-        assert export_service is not None
-        assert export_service.pdf_generator is not None
-
-
-@pytest.mark.unit
-class TestApplyFilters:
-    """Test filter application"""
-
-    def test_apply_filters_confidence_threshold(self, export_service):
-        """Test confidence threshold filter"""
-        result1 = Mock()
-        result1.average_confidence = 0.95
-        result1.file = Mock()
-        result1.file.original_filename = "test1.png"
-
-        result2 = Mock()
-        result2.average_confidence = 0.75
-        result2.file = Mock()
-        result2.file.original_filename = "test2.png"
-
-        result3 = Mock()
-        result3.average_confidence = 0.85
-        result3.file = Mock()
-        result3.file.original_filename = "test3.png"
-
-        results = [result1, result2, result3]
-        filters = {"confidence_threshold": 0.80}
-
-        filtered = export_service.apply_filters(results, filters)
-
-        assert len(filtered) == 2
-        assert result1 in filtered
-        assert result3 in filtered
-        assert result2 not in filtered
-
-    def test_apply_filters_filename_pattern(self, export_service):
-        """Test filename pattern filter"""
-        result1 = Mock()
-        result1.average_confidence = 0.95
-        result1.file = Mock()
-        result1.file.original_filename = "invoice_2024.png"
-
-        result2 = Mock()
-        result2.average_confidence = 0.95
-        result2.file = Mock()
-        result2.file.original_filename = "receipt.png"
-
-        results = [result1, result2]
-        filters = {"filename_pattern": "invoice"}
-
-        filtered = export_service.apply_filters(results, filters)
-
-        assert len(filtered) == 1
-        assert result1 in filtered
-
-    def test_apply_filters_language(self, export_service):
-        """Test language filter"""
-        result1 = Mock()
-        result1.detected_language = "zh"
-        result1.average_confidence = 0.95
-        result1.file = Mock()
-        result1.file.original_filename = "chinese.png"
-
-        result2 = Mock()
-        result2.detected_language = "en"
-        result2.average_confidence = 0.95
-        result2.file = Mock()
-        result2.file.original_filename = "english.png"
-
-        results = [result1, result2]
-        filters = {"language": "zh"}
-
-        filtered = export_service.apply_filters(results, filters)
-
-        assert len(filtered) == 1
-        assert result1 in filtered
-
-    def test_apply_filters_combined(self, export_service):
-        """Test multiple filters combined"""
-        result1 = Mock()
-        result1.detected_language = "zh"
-        result1.average_confidence = 0.95
-        result1.file = Mock()
-        result1.file.original_filename = "invoice_chinese.png"
-
-        result2 = Mock()
-        result2.detected_language = "zh"
-        result2.average_confidence = 0.75
-        result2.file = Mock()
-        result2.file.original_filename = "invoice_low.png"
-
-        result3 = Mock()
-        result3.detected_language = "en"
-        result3.average_confidence = 0.95
-        result3.file = Mock()
-        result3.file.original_filename = "invoice_english.png"
-
-        results = [result1, result2, result3]
-        filters = {
-            "confidence_threshold": 0.80,
-            "language": "zh",
-            "filename_pattern": "invoice"
-        }
-
-        filtered = export_service.apply_filters(results, filters)
-
-        assert len(filtered) == 1
-        assert result1 in filtered
-
-    def test_apply_filters_no_filters(self, export_service):
-        """Test with no filters applied"""
-        results = [Mock(), Mock(), Mock()]
-        filtered = export_service.apply_filters(results, {})
-
-        assert len(filtered) == len(results)
-
-
-@pytest.mark.unit
-class TestExportToTXT:
-    """Test TXT export"""
-
-    def test_export_to_txt_basic(self, export_service, mock_ocr_result, temp_dir):
-        """Test basic TXT export"""
-        output_path = temp_dir / "output.txt"
-
-        result_path = export_service.export_to_txt([mock_ocr_result], output_path)
-
-        assert result_path.exists()
-        content = result_path.read_text(encoding="utf-8")
-        assert "Test Document" in content
-        assert "test content" in content
-
-    def test_export_to_txt_with_line_numbers(self, export_service, mock_ocr_result, temp_dir):
-        """Test TXT export with line numbers"""
-        output_path = temp_dir / "output.txt"
-        formatting = {"add_line_numbers": True}
-
-        result_path = export_service.export_to_txt(
-            [mock_ocr_result],
-            output_path,
-            formatting=formatting
-        )
-
-        content = result_path.read_text(encoding="utf-8")
-        assert "|" in content  # Line number separator
-
-    def test_export_to_txt_with_metadata(self, export_service, mock_ocr_result, temp_dir):
-        """Test TXT export with metadata headers"""
-        output_path = temp_dir / "output.txt"
-        formatting = {"include_metadata": True}
-
-        result_path = export_service.export_to_txt(
-            [mock_ocr_result],
-            output_path,
-            formatting=formatting
-        )
-
-        content = result_path.read_text(encoding="utf-8")
-        assert "文件:" in content
-        assert "test.png" in content
-        assert "信心度:" in content
-
-    def test_export_to_txt_with_grouping(self, export_service, mock_ocr_result, temp_dir):
-        """Test TXT export with file grouping"""
-        output_path = temp_dir / "output.txt"
-        formatting = {"group_by_filename": True}
-
-        result_path = export_service.export_to_txt(
-            [mock_ocr_result, mock_ocr_result],
-            output_path,
-            formatting=formatting
-        )
-
-        content = result_path.read_text(encoding="utf-8")
-        assert "-" * 80 in content  # Separator
-
-    def test_export_to_txt_missing_markdown(self, export_service, temp_dir):
-        """Test TXT export with missing markdown file"""
-        result = Mock()
-        result.id = 1
-        result.markdown_path = "/nonexistent/path.md"
-        result.file = Mock()
-        result.file.original_filename = "test.png"
-
-        output_path = temp_dir / "output.txt"
-
-        # Should not fail, just skip the file
-        result_path = export_service.export_to_txt([result], output_path)
-        assert result_path.exists()
-
-    def test_export_to_txt_creates_parent_directories(self, export_service, mock_ocr_result, temp_dir):
-        """Test that export creates necessary parent directories"""
-        output_path = temp_dir / "subdir" / "output.txt"
-
-        result_path = export_service.export_to_txt([mock_ocr_result], output_path)
-
-        assert result_path.exists()
-        assert result_path.parent.exists()
-
-
-@pytest.mark.unit
-class TestExportToJSON:
-    """Test JSON export"""
-
-    def test_export_to_json_basic(self, export_service, mock_ocr_result, temp_dir):
-        """Test basic JSON export"""
-        output_path = temp_dir / "output.json"
-
-        result_path = export_service.export_to_json([mock_ocr_result], output_path)
-
-        assert result_path.exists()
-        data = json.loads(result_path.read_text(encoding="utf-8"))
-
-        assert "export_time" in data
-        assert data["total_files"] == 1
-        assert len(data["results"]) == 1
-        assert data["results"][0]["filename"] == "test.png"
-        assert data["results"][0]["average_confidence"] == 0.95
-
-    def test_export_to_json_with_layout(self, export_service, mock_ocr_result, temp_dir):
-        """Test JSON export with layout data"""
-        output_path = temp_dir / "output.json"
-
-        result_path = export_service.export_to_json(
-            [mock_ocr_result],
-            output_path,
-            include_layout=True
-        )
-
-        data = json.loads(result_path.read_text(encoding="utf-8"))
-        assert "layout_data" in data["results"][0]
-
-    def test_export_to_json_without_layout(self, export_service, mock_ocr_result, temp_dir):
-        """Test JSON export without layout data"""
-        output_path = temp_dir / "output.json"
-
-        result_path = export_service.export_to_json(
-            [mock_ocr_result],
-            output_path,
-            include_layout=False
-        )
-
-        data = json.loads(result_path.read_text(encoding="utf-8"))
-        assert "layout_data" not in data["results"][0]
-
-    def test_export_to_json_multiple_results(self, export_service, mock_ocr_result, temp_dir):
-        """Test JSON export with multiple results"""
-        output_path = temp_dir / "output.json"
-
-        result_path = export_service.export_to_json(
-            [mock_ocr_result, mock_ocr_result],
-            output_path
-        )
-
-        data = json.loads(result_path.read_text(encoding="utf-8"))
-        assert data["total_files"] == 2
-        assert len(data["results"]) == 2
-
-
-@pytest.mark.unit
-class TestExportToExcel:
-    """Test Excel export"""
-
-    def test_export_to_excel_basic(self, export_service, mock_ocr_result, temp_dir):
-        """Test basic Excel export"""
-        output_path = temp_dir / "output.xlsx"
-
-        result_path = export_service.export_to_excel([mock_ocr_result], output_path)
-
-        assert result_path.exists()
-        df = pd.read_excel(result_path)
-        assert len(df) == 1
-        assert "文件名" in df.columns
-        assert df.iloc[0]["文件名"] == "test.png"
-
-    def test_export_to_excel_with_confidence(self, export_service, mock_ocr_result, temp_dir):
-        """Test Excel export with confidence scores"""
-        output_path = temp_dir / "output.xlsx"
-
-        result_path = export_service.export_to_excel(
-            [mock_ocr_result],
-            output_path,
-            include_confidence=True
-        )
-
-        df = pd.read_excel(result_path)
-        assert "平均信心度" in df.columns
-
-    def test_export_to_excel_without_processing_time(self, export_service, mock_ocr_result, temp_dir):
-        """Test Excel export without processing time"""
-        output_path = temp_dir / "output.xlsx"
-
-        result_path = export_service.export_to_excel(
-            [mock_ocr_result],
-            output_path,
-            include_processing_time=False
-        )
-
-        df = pd.read_excel(result_path)
-        assert "處理時間(秒)" not in df.columns
-
-    def test_export_to_excel_long_content_truncation(self, export_service, temp_dir):
-        """Test that long content is truncated in Excel"""
-        # Create result with long content
-        md_file = temp_dir / "long.md"
-        md_file.write_text("x" * 2000, encoding="utf-8")
-
-        result = Mock()
-        result.id = 1
-        result.markdown_path = str(md_file)
-        result.detected_language = "zh"
-        result.total_text_regions = 10
-        result.average_confidence = 0.95
-        result.file = Mock()
-        result.file.original_filename = "long.png"
-        result.file.file_format = "png"
-        result.file.file_size = 1024
-        result.file.processing_time = 1.0
-
-        output_path = temp_dir / "output.xlsx"
-        result_path = export_service.export_to_excel([result], output_path)
-
-        df = pd.read_excel(result_path)
-        content = df.iloc[0]["提取內容"]
-        assert "..." in content
-        assert len(content) <= 1004  # 1000 + "..."
-
-
-@pytest.mark.unit
-class TestExportToMarkdown:
-    """Test Markdown export"""
-
-    def test_export_to_markdown_combined(self, export_service, mock_ocr_result, temp_dir):
-        """Test combined Markdown export"""
-        output_path = temp_dir / "combined.md"
-
-        result_path = export_service.export_to_markdown(
-            [mock_ocr_result],
-            output_path,
-            combine=True
-        )
-
-        assert result_path.exists()
-        assert result_path.is_file()
-        content = result_path.read_text(encoding="utf-8")
-        assert "test.png" in content
-        assert "Test Document" in content
-
-    def test_export_to_markdown_separate(self, export_service, mock_ocr_result, temp_dir):
-        """Test separate Markdown export"""
-        output_dir = temp_dir / "markdown_files"
-
-        result_path = export_service.export_to_markdown(
-            [mock_ocr_result],
-            output_dir,
-            combine=False
-        )
-
-        assert result_path.exists()
-        assert result_path.is_dir()
-        files = list(result_path.glob("*.md"))
-        assert len(files) == 1
-
-    def test_export_to_markdown_multiple_files(self, export_service, mock_ocr_result, temp_dir):
-        """Test Markdown export with multiple files"""
-        output_path = temp_dir / "combined.md"
-
-        result_path = export_service.export_to_markdown(
-            [mock_ocr_result, mock_ocr_result],
-            output_path,
-            combine=True
-        )
-
-        content = result_path.read_text(encoding="utf-8")
-        assert content.count("---") >= 1  # Separators
-
-
-@pytest.mark.unit
-class TestExportToPDF:
-    """Test PDF export"""
-
-    @patch.object(ExportService, '__init__', lambda self: None)
-    def test_export_to_pdf_success(self, mock_ocr_result, temp_dir):
-        """Test successful PDF export"""
-        from app.services.pdf_generator import PDFGenerator
-
-        service = ExportService()
-        service.pdf_generator = Mock(spec=PDFGenerator)
-        service.pdf_generator.generate_pdf = Mock(return_value=temp_dir / "output.pdf")
-
-        output_path = temp_dir / "output.pdf"
-
-        result_path = service.export_to_pdf(mock_ocr_result, output_path)
-
-        service.pdf_generator.generate_pdf.assert_called_once()
-        call_kwargs = service.pdf_generator.generate_pdf.call_args[1]
-        assert call_kwargs["css_template"] == "default"
-
-    @patch.object(ExportService, '__init__', lambda self: None)
-    def test_export_to_pdf_with_custom_template(self, mock_ocr_result, temp_dir):
-        """Test PDF export with custom CSS template"""
-        from app.services.pdf_generator import PDFGenerator
-
-        service = ExportService()
-        service.pdf_generator = Mock(spec=PDFGenerator)
-        service.pdf_generator.generate_pdf = Mock(return_value=temp_dir / "output.pdf")
-
-        output_path = temp_dir / "output.pdf"
-
-        service.export_to_pdf(mock_ocr_result, output_path, css_template="academic")
-
-        call_kwargs = service.pdf_generator.generate_pdf.call_args[1]
-        assert call_kwargs["css_template"] == "academic"
-
-    @patch.object(ExportService, '__init__', lambda self: None)
-    def test_export_to_pdf_missing_markdown(self, temp_dir):
-        """Test PDF export with missing markdown file"""
-        from app.services.pdf_generator import PDFGenerator
-
-        result = Mock()
-        result.id = 1
-        result.markdown_path = None
-        result.file = Mock()
-
-        service = ExportService()
-        service.pdf_generator = Mock(spec=PDFGenerator)
-
-        output_path = temp_dir / "output.pdf"
-
-        with pytest.raises(ExportError) as exc_info:
-            service.export_to_pdf(result, output_path)
-
-        assert "not found" in str(exc_info.value).lower()
-
-
-@pytest.mark.unit
-class TestGetExportFormats:
-    """Test getting available export formats"""
-
-    def test_get_export_formats(self, export_service):
-        """Test getting export formats"""
-        formats = export_service.get_export_formats()
-
-        assert isinstance(formats, dict)
-        assert "txt" in formats
-        assert "json" in formats
-        assert "excel" in formats
-        assert "markdown" in formats
-        assert "pdf" in formats
-        assert "zip" in formats
-
-        # Check descriptions are in Chinese
-        for desc in formats.values():
-            assert isinstance(desc, str)
-            assert len(desc) > 0
-
-
-@pytest.mark.unit
-class TestApplyExportRule:
-    """Test export rule application"""
-
-    def test_apply_export_rule_success(self, export_service, mock_db):
-        """Test applying export rule"""
-        # Create mock rule
-        rule = Mock()
-        rule.id = 1
-        rule.config_json = {
-            "filters": {
-                "confidence_threshold": 0.80
-            }
-        }
-
-        mock_db.query.return_value.filter.return_value.first.return_value = rule
-
-        # Create mock results
-        result1 = Mock()
-        result1.average_confidence = 0.95
-        result1.file = Mock()
-        result1.file.original_filename = "test1.png"
-
-        result2 = Mock()
-        result2.average_confidence = 0.70
-        result2.file = Mock()
-        result2.file.original_filename = "test2.png"
-
-        results = [result1, result2]
-
-        filtered = export_service.apply_export_rule(mock_db, results, rule_id=1)
-
-        assert len(filtered) == 1
-        assert result1 in filtered
-
-    def test_apply_export_rule_not_found(self, export_service, mock_db):
-        """Test applying non-existent rule"""
-        mock_db.query.return_value.filter.return_value.first.return_value = None
-
-        with pytest.raises(ExportError) as exc_info:
-            export_service.apply_export_rule(mock_db, [], rule_id=999)
-
-        assert "not found" in str(exc_info.value).lower()
-
-
-@pytest.mark.unit
-class TestEdgeCases:
-    """Test edge cases and error handling"""
-
-    def test_export_to_txt_empty_results(self, export_service, temp_dir):
-        """Test TXT export with empty results list"""
-        output_path = temp_dir / "output.txt"
-
-        result_path = export_service.export_to_txt([], output_path)
-
-        assert result_path.exists()
-        content = result_path.read_text(encoding="utf-8")
-        assert content == ""
-
-    def test_export_to_json_empty_results(self, export_service, temp_dir):
-        """Test JSON export with empty results list"""
-        output_path = temp_dir / "output.json"
-
-        result_path = export_service.export_to_json([], output_path)
-
-        data = json.loads(result_path.read_text(encoding="utf-8"))
-        assert data["total_files"] == 0
-        assert len(data["results"]) == 0
-
-    def test_export_with_unicode_content(self, export_service, temp_dir):
-        """Test export with Unicode/Chinese content"""
-        md_file = temp_dir / "chinese.md"
-        md_file.write_text("# 測試文檔\n\n這是中文內容。", encoding="utf-8")
-
-        result = Mock()
-        result.id = 1
-        result.markdown_path = str(md_file)
-        result.json_path = None
-        result.detected_language = "zh"
-        result.total_text_regions = 10
-        result.average_confidence = 0.95
-        result.layout_data = None  # Use None instead of Mock for JSON serialization
-        result.images_metadata = None  # Use None instead of Mock
-        result.file = Mock()
-        result.file.id = 1
-        result.file.original_filename = "中文測試.png"
-        result.file.file_format = "png"
-        result.file.file_size = 1024
-        result.file.processing_time = 1.0
-
-        # Test TXT export
-        txt_path = temp_dir / "output.txt"
-        export_service.export_to_txt([result], txt_path)
-        assert "測試文檔" in txt_path.read_text(encoding="utf-8")
-
-        # Test JSON export
-        json_path = temp_dir / "output.json"
-        export_service.export_to_json([result], json_path)
-        data = json.loads(json_path.read_text(encoding="utf-8"))
-        assert data["results"][0]["filename"] == "中文測試.png"
-
-    def test_apply_filters_with_none_values(self, export_service):
-        """Test filters with None values in results"""
-        result = Mock()
-        result.average_confidence = None
-        result.detected_language = None
-        result.file = Mock()
-        result.file.original_filename = "test.png"
-
-        filters = {"confidence_threshold": 0.80}
-
-        filtered = export_service.apply_filters([result], filters)
-
-        # Should filter out result with None confidence
-        assert len(filtered) == 0
--- a/backend/tests/test_file_manager.py
+++ b/backend/tests/test_file_manager.py
@@ -1,520 +0,0 @@
-"""
-Tool_OCR - File Manager Unit Tests
-Tests for app/services/file_manager.py
-"""
-
-import pytest
-import shutil
-from pathlib import Path
-from unittest.mock import Mock, patch, MagicMock
-from datetime import datetime, timedelta
-from io import BytesIO
-
-from fastapi import UploadFile
-
-from app.services.file_manager import FileManager, FileManagementError
-from app.models.ocr import OCRBatch, OCRFile, FileStatus, BatchStatus
-
-
-@pytest.fixture
-def file_manager(temp_dir):
-    """Create a FileManager instance with temp directory"""
-    with patch('app.services.file_manager.settings') as mock_settings:
-        mock_settings.upload_dir = str(temp_dir)
-        mock_settings.max_upload_size = 20 * 1024 * 1024  # 20MB
-        mock_settings.allowed_extensions_list = ['png', 'jpg', 'jpeg', 'pdf']
-        manager = FileManager()
-        return manager
-
-
-@pytest.fixture
-def mock_upload_file():
-    """Create a mock UploadFile"""
-    def create_file(filename="test.png", content=b"test content", size=None):
-        file_obj = BytesIO(content)
-        if size is None:
-            size = len(content)
-
-        upload_file = UploadFile(filename=filename, file=file_obj)
-        # Set file size manually
-        upload_file.file.seek(0, 2)  # Seek to end
-        upload_file.file.seek(0)     # Reset
-        return upload_file
-
-    return create_file
-
-
-@pytest.fixture
-def mock_db():
-    """Create a mock database session"""
-    return Mock()
-
-
-@pytest.mark.unit
-class TestFileManagerInit:
-    """Test FileManager initialization"""
-
-    def test_init(self, file_manager, temp_dir):
-        """Test file manager initialization"""
-        assert file_manager is not None
-        assert file_manager.preprocessor is not None
-        assert file_manager.base_upload_dir == temp_dir
-        assert file_manager.base_upload_dir.exists()
-
-
-@pytest.mark.unit
-class TestBatchDirectoryManagement:
-    """Test batch directory creation and management"""
-
-    def test_create_batch_directory(self, file_manager):
-        """Test creating batch directory structure"""
-        batch_id = 123
-        batch_dir = file_manager.create_batch_directory(batch_id)
-
-        assert batch_dir.exists()
-        assert (batch_dir / "inputs").exists()
-        assert (batch_dir / "outputs" / "markdown").exists()
-        assert (batch_dir / "outputs" / "json").exists()
-        assert (batch_dir / "outputs" / "images").exists()
-        assert (batch_dir / "exports").exists()
-
-    def test_create_batch_directory_multiple_times(self, file_manager):
-        """Test creating same batch directory multiple times (should not error)"""
-        batch_id = 123
-
-        batch_dir1 = file_manager.create_batch_directory(batch_id)
-        batch_dir2 = file_manager.create_batch_directory(batch_id)
-
-        assert batch_dir1 == batch_dir2
-        assert batch_dir1.exists()
-
-    def test_get_batch_directory(self, file_manager):
-        """Test getting batch directory path"""
-        batch_id = 456
-        batch_dir = file_manager.get_batch_directory(batch_id)
-
-        expected_path = file_manager.base_upload_dir / "batches" / "456"
-        assert batch_dir == expected_path
-
-
-@pytest.mark.unit
-class TestUploadValidation:
-    """Test file upload validation"""
-
-    def test_validate_upload_valid_file(self, file_manager, mock_upload_file):
-        """Test validation of valid upload"""
-        upload = mock_upload_file("test.png", b"valid content")
-
-        is_valid, error = file_manager.validate_upload(upload)
-
-        assert is_valid is True
-        assert error is None
-
-    def test_validate_upload_empty_filename(self, file_manager):
-        """Test validation with empty filename"""
-        upload = Mock()
-        upload.filename = ""
-
-        is_valid, error = file_manager.validate_upload(upload)
-
-        assert is_valid is False
-        assert "文件名不能為空" in error
-
-    def test_validate_upload_empty_file(self, file_manager, mock_upload_file):
-        """Test validation of empty file"""
-        upload = mock_upload_file("test.png", b"")
-
-        is_valid, error = file_manager.validate_upload(upload)
-
-        assert is_valid is False
-        assert "文件為空" in error
-
-    @pytest.mark.skip(reason="File size mock is complex with UploadFile, covered by integration test")
-    def test_validate_upload_file_too_large(self, file_manager):
-        """Test validation of file exceeding size limit"""
-        # Note: This functionality is tested in integration tests where actual
-        # files can be created. Mocking UploadFile's size behavior is complex.
-        pass
-
-    def test_validate_upload_unsupported_format(self, file_manager, mock_upload_file):
-        """Test validation of unsupported file format"""
-        upload = mock_upload_file("test.txt", b"text content")
-
-        is_valid, error = file_manager.validate_upload(upload)
-
-        assert is_valid is False
-        assert "不支持的文件格式" in error
-
-    def test_validate_upload_supported_formats(self, file_manager, mock_upload_file):
-        """Test validation of all supported formats"""
-        supported_formats = ["test.png", "test.jpg", "test.jpeg", "test.pdf"]
-
-        for filename in supported_formats:
-            upload = mock_upload_file(filename, b"content")
-            is_valid, error = file_manager.validate_upload(upload)
-            assert is_valid is True, f"Failed for {filename}"
-
-
-@pytest.mark.unit
-class TestFileSaving:
-    """Test file saving operations"""
-
-    def test_save_upload_success(self, file_manager, mock_upload_file):
-        """Test successful file saving"""
-        batch_id = 1
-        file_manager.create_batch_directory(batch_id)
-
-        upload = mock_upload_file("test.png", b"test content")
-
-        file_path, original_filename = file_manager.save_upload(upload, batch_id)
-
-        assert file_path.exists()
-        assert file_path.read_bytes() == b"test content"
-        assert original_filename == "test.png"
-        assert file_path.parent.name == "inputs"
-
-    def test_save_upload_unique_filename(self, file_manager, mock_upload_file):
-        """Test that saved files get unique filenames"""
-        batch_id = 1
-        file_manager.create_batch_directory(batch_id)
-
-        upload1 = mock_upload_file("test.png", b"content1")
-        upload2 = mock_upload_file("test.png", b"content2")
-
-        path1, _ = file_manager.save_upload(upload1, batch_id)
-        path2, _ = file_manager.save_upload(upload2, batch_id)
-
-        assert path1 != path2
-        assert path1.exists() and path2.exists()
-        assert path1.read_bytes() == b"content1"
-        assert path2.read_bytes() == b"content2"
-
-    def test_save_upload_validation_failure(self, file_manager, mock_upload_file):
-        """Test save upload with validation failure"""
-        batch_id = 1
-        file_manager.create_batch_directory(batch_id)
-
-        # Empty file should fail validation
-        upload = mock_upload_file("test.png", b"")
-
-        with pytest.raises(FileManagementError) as exc_info:
-            file_manager.save_upload(upload, batch_id, validate=True)
-
-        assert "文件為空" in str(exc_info.value)
-
-    def test_save_upload_skip_validation(self, file_manager, mock_upload_file):
-        """Test saving with validation skipped"""
-        batch_id = 1
-        file_manager.create_batch_directory(batch_id)
-
-        # Empty file but validation skipped
-        upload = mock_upload_file("test.txt", b"")
-
-        # Should succeed when validation is disabled
-        file_path, _ = file_manager.save_upload(upload, batch_id, validate=False)
-        assert file_path.exists()
-
-    def test_save_upload_preserves_extension(self, file_manager, mock_upload_file):
-        """Test that file extension is preserved"""
-        batch_id = 1
-        file_manager.create_batch_directory(batch_id)
-
-        upload = mock_upload_file("document.pdf", b"pdf content")
-
-        file_path, _ = file_manager.save_upload(upload, batch_id)
-
-        assert file_path.suffix == ".pdf"
-
-
-@pytest.mark.unit
-class TestValidateSavedFile:
-    """Test validation of saved files"""
-
-    @patch.object(FileManager, '__init__', lambda self: None)
-    def test_validate_saved_file(self, sample_image_path):
-        """Test validating a saved file"""
-        from app.services.preprocessor import DocumentPreprocessor
-
-        manager = FileManager()
-        manager.preprocessor = DocumentPreprocessor()
-
-        # validate_file returns (is_valid, file_format, error_message)
-        is_valid, file_format, error = manager.validate_saved_file(sample_image_path)
-
-        assert is_valid is True
-        assert file_format == 'png'
-        assert error is None
-
-
-@pytest.mark.unit
-class TestBatchCreation:
-    """Test batch creation"""
-
-    def test_create_batch(self, file_manager, mock_db):
-        """Test creating a new batch"""
-        user_id = 1
-
-        # Mock database operations
-        mock_batch = Mock()
-        mock_batch.id = 123
-        mock_db.add = Mock()
-        mock_db.commit = Mock()
-        mock_db.refresh = Mock(side_effect=lambda x: setattr(x, 'id', 123))
-
-        with patch.object(FileManager, 'create_batch_directory'):
-            batch = file_manager.create_batch(mock_db, user_id)
-
-        assert mock_db.add.called
-        assert mock_db.commit.called
-
-    def test_create_batch_with_custom_name(self, file_manager, mock_db):
-        """Test creating batch with custom name"""
-        user_id = 1
-        batch_name = "My Custom Batch"
-
-        mock_db.add = Mock()
-        mock_db.commit = Mock()
-        mock_db.refresh = Mock(side_effect=lambda x: setattr(x, 'id', 123))
-
-        with patch.object(FileManager, 'create_batch_directory'):
-            batch = file_manager.create_batch(mock_db, user_id, batch_name)
-
-        # Verify batch was created with correct name
-        call_args = mock_db.add.call_args[0][0]
-        assert hasattr(call_args, 'batch_name')
-
-
-@pytest.mark.unit
-class TestGetFilePaths:
-    """Test file path retrieval"""
-
-    def test_get_file_paths(self, file_manager):
-        """Test getting file paths for a batch"""
-        batch_id = 1
-        file_id = 42
-
-        paths = file_manager.get_file_paths(batch_id, file_id)
-
-        assert "input_dir" in paths
-        assert "output_dir" in paths
-        assert "markdown_dir" in paths
-        assert "json_dir" in paths
-        assert "images_dir" in paths
-        assert "export_dir" in paths
-
-        # Verify images_dir includes file_id
-        assert str(file_id) in str(paths["images_dir"])
-
-
-@pytest.mark.unit
-class TestCleanupExpiredBatches:
-    """Test cleanup of expired batches"""
-
-    def test_cleanup_expired_batches(self, file_manager, mock_db, temp_dir):
-        """Test cleaning up expired batches"""
-        # Create mock expired batch
-        expired_batch = Mock()
-        expired_batch.id = 1
-        expired_batch.created_at = datetime.utcnow() - timedelta(hours=48)
-
-        # Create batch directory
-        batch_dir = file_manager.create_batch_directory(1)
-        assert batch_dir.exists()
-
-        # Mock database query
-        mock_db.query.return_value.filter.return_value.all.return_value = [expired_batch]
-        mock_db.delete = Mock()
-        mock_db.commit = Mock()
-
-        # Run cleanup
-        cleaned = file_manager.cleanup_expired_batches(mock_db, retention_hours=24)
-
-        assert cleaned == 1
-        assert not batch_dir.exists()
-        mock_db.delete.assert_called_once_with(expired_batch)
-        mock_db.commit.assert_called_once()
-
-    def test_cleanup_no_expired_batches(self, file_manager, mock_db):
-        """Test cleanup when no batches are expired"""
-        # Mock database query returning empty list
-        mock_db.query.return_value.filter.return_value.all.return_value = []
-
-        cleaned = file_manager.cleanup_expired_batches(mock_db, retention_hours=24)
-
-        assert cleaned == 0
-
-    def test_cleanup_handles_missing_directory(self, file_manager, mock_db):
-        """Test cleanup handles missing batch directory gracefully"""
-        expired_batch = Mock()
-        expired_batch.id = 999  # Directory doesn't exist
-        expired_batch.created_at = datetime.utcnow() - timedelta(hours=48)
-
-        mock_db.query.return_value.filter.return_value.all.return_value = [expired_batch]
-        mock_db.delete = Mock()
-        mock_db.commit = Mock()
-
-        # Should not raise error
-        cleaned = file_manager.cleanup_expired_batches(mock_db, retention_hours=24)
-
-        assert cleaned == 1
-
-
-@pytest.mark.unit
-class TestFileOwnershipVerification:
-    """Test file ownership verification"""
-
-    def test_verify_file_ownership_success(self, file_manager, mock_db):
-        """Test successful ownership verification"""
-        user_id = 1
-        batch_id = 123
-
-        # Mock batch owned by user
-        mock_batch = Mock()
-        mock_db.query.return_value.filter.return_value.first.return_value = mock_batch
-
-        is_owner = file_manager.verify_file_ownership(mock_db, user_id, batch_id)
-
-        assert is_owner is True
-
-    def test_verify_file_ownership_failure(self, file_manager, mock_db):
-        """Test ownership verification failure"""
-        user_id = 1
-        batch_id = 123
-
-        # Mock no batch found (wrong owner)
-        mock_db.query.return_value.filter.return_value.first.return_value = None
-
-        is_owner = file_manager.verify_file_ownership(mock_db, user_id, batch_id)
-
-        assert is_owner is False
-
-
-@pytest.mark.unit
-class TestBatchStatistics:
-    """Test batch statistics retrieval"""
-
-    def test_get_batch_statistics(self, file_manager, mock_db):
-        """Test getting batch statistics"""
-        batch_id = 1
-
-        # Create mock batch with files
-        mock_file1 = Mock()
-        mock_file1.file_size = 1000
-
-        mock_file2 = Mock()
-        mock_file2.file_size = 2000
-
-        mock_batch = Mock()
-        mock_batch.id = batch_id
-        mock_batch.batch_name = "Test Batch"
-        mock_batch.status = BatchStatus.COMPLETED
-        mock_batch.total_files = 2
-        mock_batch.completed_files = 2
-        mock_batch.failed_files = 0
-        mock_batch.progress_percentage = 100.0
-        mock_batch.files = [mock_file1, mock_file2]
-        mock_batch.created_at = datetime(2025, 1, 1, 10, 0, 0)
-        mock_batch.started_at = datetime(2025, 1, 1, 10, 1, 0)
-        mock_batch.completed_at = datetime(2025, 1, 1, 10, 5, 0)
-
-        mock_db.query.return_value.filter.return_value.first.return_value = mock_batch
-
-        stats = file_manager.get_batch_statistics(mock_db, batch_id)
-
-        assert stats['batch_id'] == batch_id
-        assert stats['batch_name'] == "Test Batch"
-        assert stats['total_files'] == 2
-        assert stats['total_file_size'] == 3000
-        assert stats['total_file_size_mb'] == 0.0  # Small files
-        assert stats['processing_time'] == 240.0  # 4 minutes
-        assert stats['pending_files'] == 0
-
-    def test_get_batch_statistics_not_found(self, file_manager, mock_db):
-        """Test getting statistics for non-existent batch"""
-        batch_id = 999
-
-        mock_db.query.return_value.filter.return_value.first.return_value = None
-
-        stats = file_manager.get_batch_statistics(mock_db, batch_id)
-
-        assert stats == {}
-
-    def test_get_batch_statistics_no_completion_time(self, file_manager, mock_db):
-        """Test statistics for batch without completion time"""
-        mock_batch = Mock()
-        mock_batch.id = 1
-        mock_batch.batch_name = "Pending Batch"
-        mock_batch.status = BatchStatus.PROCESSING
-        mock_batch.total_files = 5
-        mock_batch.completed_files = 2
-        mock_batch.failed_files = 0
-        mock_batch.progress_percentage = 40.0
-        mock_batch.files = []
-        mock_batch.created_at = datetime(2025, 1, 1)
-        mock_batch.started_at = datetime(2025, 1, 1)
-        mock_batch.completed_at = None
-
-        mock_db.query.return_value.filter.return_value.first.return_value = mock_batch
-
-        stats = file_manager.get_batch_statistics(mock_db, 1)
-
-        assert stats['processing_time'] is None
-        assert stats['pending_files'] == 3
-
-
-@pytest.mark.unit
-class TestEdgeCases:
-    """Test edge cases and error handling"""
-
-    def test_save_upload_creates_parent_directories(self, file_manager, mock_upload_file):
-        """Test that save_upload creates necessary directories"""
-        batch_id = 999  # Directory doesn't exist yet
-
-        upload = mock_upload_file("test.png", b"content")
-
-        file_path, _ = file_manager.save_upload(upload, batch_id)
-
-        assert file_path.exists()
-        assert file_path.parent.exists()
-
-    def test_cleanup_continues_on_error(self, file_manager, mock_db):
-        """Test that cleanup continues even if one batch fails"""
-        batch1 = Mock()
-        batch1.id = 1
-        batch1.created_at = datetime.utcnow() - timedelta(hours=48)
-
-        batch2 = Mock()
-        batch2.id = 2
-        batch2.created_at = datetime.utcnow() - timedelta(hours=48)
-
-        # Create only batch2 directory
-        file_manager.create_batch_directory(2)
-
-        mock_db.query.return_value.filter.return_value.all.return_value = [batch1, batch2]
-        mock_db.delete = Mock()
-        mock_db.commit = Mock()
-
-        # Should not fail, should clean batch2 even if batch1 fails
-        cleaned = file_manager.cleanup_expired_batches(mock_db, retention_hours=24)
-
-        assert cleaned > 0
-
-    def test_validate_upload_with_unicode_filename(self, file_manager, mock_upload_file):
-        """Test validation with Unicode filename"""
-        upload = mock_upload_file("測試文件.png", b"content")
-
-        is_valid, error = file_manager.validate_upload(upload)
-
-        assert is_valid is True
-
-    def test_save_upload_preserves_unicode_filename(self, file_manager, mock_upload_file):
-        """Test that Unicode filenames are handled correctly"""
-        batch_id = 1
-        file_manager.create_batch_directory(batch_id)
-
-        upload = mock_upload_file("中文文檔.pdf", b"content")
-
-        file_path, original_filename = file_manager.save_upload(upload, batch_id)
-
-        assert original_filename == "中文文檔.pdf"
-        assert file_path.exists()
--- a/backend/tests/test_integration.py
+++ b/backend/tests/test_integration.py
@@ -1,182 +0,0 @@
-"""
-Integration tests for Tool_OCR
-Tests the complete flow of authentication, task creation, and file operations
-"""
-
-import pytest
-from unittest.mock import patch
-
-
-class TestIntegration:
-    """Integration tests for end-to-end workflows"""
-
-    def test_complete_auth_and_task_flow(self, client, db):
-        """Test complete flow: login -> create task -> get task -> delete task"""
-
-        # Step 1: Login
-        from app.services.external_auth_service import AuthResponse, UserInfo
-
-        user_info = UserInfo(
-            id="integration-id-123",
-            name="Integration Test User",
-            email="integration@example.com"
-        )
-        auth_response = AuthResponse(
-            access_token="test-token",
-            id_token="test-id-token",
-            expires_in=3600,
-            token_type="Bearer",
-            user_info=user_info,
-            issued_at="2025-11-16T10:00:00Z",
-            expires_at="2025-11-16T11:00:00Z"
-        )
-
-        with patch('app.routers.auth.external_auth_service.authenticate_user') as mock_auth:
-            mock_auth.return_value = (True, auth_response, None)
-
-            login_response = client.post('/api/v2/auth/login', json={
-                'username': 'integration@example.com',
-                'password': 'password123'
-            })
-
-            assert login_response.status_code == 200
-            token = login_response.json()['access_token']
-            headers = {'Authorization': f'Bearer {token}'}
-
-        # Step 2: Create task
-        create_response = client.post(
-            '/api/v2/tasks/',
-            headers=headers,
-            json={
-                'filename': 'integration_test.pdf',
-                'file_type': 'application/pdf'
-            }
-        )
-
-        assert create_response.status_code == 201
-        task_data = create_response.json()
-        task_id = task_data['task_id']
-
-        # Step 3: Get task
-        get_response = client.get(
-            f'/api/v2/tasks/{task_id}',
-            headers=headers
-        )
-
-        assert get_response.status_code == 200
-        assert get_response.json()['task_id'] == task_id
-
-        # Step 4: List tasks
-        list_response = client.get(
-            '/api/v2/tasks/',
-            headers=headers
-        )
-
-        assert list_response.status_code == 200
-        assert len(list_response.json()['tasks']) > 0
-
-        # Step 5: Get stats
-        stats_response = client.get(
-            '/api/v2/tasks/stats',
-            headers=headers
-        )
-
-        assert stats_response.status_code == 200
-        stats = stats_response.json()
-        assert stats['total'] > 0
-        assert stats['pending'] > 0
-
-        # Step 6: Delete task
-        delete_response = client.delete(
-            f'/api/v2/tasks/{task_id}',
-            headers=headers
-        )
-
-        # DELETE returns 204 No Content (standard for successful deletion)
-        assert delete_response.status_code == 204
-
-        # Step 7: Verify deletion
-        get_after_delete = client.get(
-            f'/api/v2/tasks/{task_id}',
-            headers=headers
-        )
-
-        assert get_after_delete.status_code == 404
-
-    def test_admin_workflow(self, client, db):
-        """Test admin workflow: login as admin -> access admin endpoints"""
-
-        # Login as admin
-        from app.services.external_auth_service import AuthResponse, UserInfo
-
-        user_info = UserInfo(
-            id="admin-id-123",
-            name="Admin User",
-            email="ymirliu@panjit.com.tw"
-        )
-        auth_response = AuthResponse(
-            access_token="admin-token",
-            id_token="admin-id-token",
-            expires_in=3600,
-            token_type="Bearer",
-            user_info=user_info,
-            issued_at="2025-11-16T10:00:00Z",
-            expires_at="2025-11-16T11:00:00Z"
-        )
-
-        with patch('app.routers.auth.external_auth_service.authenticate_user') as mock_auth:
-            mock_auth.return_value = (True, auth_response, None)
-
-            login_response = client.post('/api/v2/auth/login', json={
-                'username': 'ymirliu@panjit.com.tw',
-                'password': 'adminpass'
-            })
-
-            assert login_response.status_code == 200
-            token = login_response.json()['access_token']
-            headers = {'Authorization': f'Bearer {token}'}
-
-        # Access admin endpoints
-        stats_response = client.get('/api/v2/admin/stats', headers=headers)
-        assert stats_response.status_code == 200
-
-        users_response = client.get('/api/v2/admin/users', headers=headers)
-        assert users_response.status_code == 200
-
-        logs_response = client.get('/api/v2/admin/audit-logs', headers=headers)
-        assert logs_response.status_code == 200
-
-    def test_task_lifecycle(self, client, auth_token, test_task, db):
-        """Test complete task lifecycle: pending -> processing -> completed"""
-
-        headers = {'Authorization': f'Bearer {auth_token}'}
-
-        # Check initial status
-        response = client.get(f'/api/v2/tasks/{test_task.task_id}', headers=headers)
-        assert response.json()['status'] == 'pending'
-
-        # Start task
-        start_response = client.post(
-            f'/api/v2/tasks/{test_task.task_id}/start',
-            headers=headers
-        )
-        assert start_response.status_code == 200
-        assert start_response.json()['status'] == 'processing'
-
-        # Update task to completed
-        update_response = client.patch(
-            f'/api/v2/tasks/{test_task.task_id}',
-            headers=headers,
-            json={
-                'status': 'completed',
-                'processing_time_ms': 1500
-            }
-        )
-        assert update_response.status_code == 200
-        assert update_response.json()['status'] == 'completed'
-
-        # Verify final state
-        final_response = client.get(f'/api/v2/tasks/{test_task.task_id}', headers=headers)
-        final_data = final_response.json()
-        assert final_data['status'] == 'completed'
-        assert final_data['processing_time_ms'] == 1500
--- a/backend/tests/test_ocr_service.py
+++ b/backend/tests/test_ocr_service.py
@@ -1,528 +0,0 @@
-"""
-Tool_OCR - OCR Service Unit Tests
-Tests for app/services/ocr_service.py
-"""
-
-import pytest
-import json
-from pathlib import Path
-from unittest.mock import Mock, patch, MagicMock
-
-from app.services.ocr_service import OCRService
-
-
-@pytest.mark.unit
-class TestOCRServiceInit:
-    """Test OCR service initialization"""
-
-    def test_init(self):
-        """Test OCR service initialization"""
-        service = OCRService()
-
-        assert service is not None
-        assert service.ocr_engines == {}
-        assert service.structure_engine is None
-        assert service.confidence_threshold > 0
-        assert len(service.ocr_languages) > 0
-
-    def test_supported_languages(self):
-        """Test that supported languages are configured"""
-        service = OCRService()
-
-        # Should have at least Chinese and English
-        assert 'ch' in service.ocr_languages or 'en' in service.ocr_languages
-
-
-@pytest.mark.unit
-class TestOCREngineLazyLoading:
-    """Test OCR engine lazy loading"""
-
-    @patch('app.services.ocr_service.PaddleOCR')
-    def test_get_ocr_engine_creates_new_engine(self, mock_paddle_ocr):
-        """Test that get_ocr_engine creates engine on first call"""
-        mock_engine = Mock()
-        mock_paddle_ocr.return_value = mock_engine
-
-        service = OCRService()
-        engine = service.get_ocr_engine(lang='en')
-
-        assert engine == mock_engine
-        mock_paddle_ocr.assert_called_once()
-        assert 'en' in service.ocr_engines
-
-    @patch('app.services.ocr_service.PaddleOCR')
-    def test_get_ocr_engine_reuses_existing_engine(self, mock_paddle_ocr):
-        """Test that get_ocr_engine reuses existing engine"""
-        mock_engine = Mock()
-        mock_paddle_ocr.return_value = mock_engine
-
-        service = OCRService()
-
-        # First call creates engine
-        engine1 = service.get_ocr_engine(lang='en')
-        # Second call should reuse
-        engine2 = service.get_ocr_engine(lang='en')
-
-        assert engine1 == engine2
-        mock_paddle_ocr.assert_called_once()
-
-    @patch('app.services.ocr_service.PaddleOCR')
-    def test_get_ocr_engine_different_languages(self, mock_paddle_ocr):
-        """Test that different languages get different engines"""
-        mock_paddle_ocr.return_value = Mock()
-
-        service = OCRService()
-
-        engine_en = service.get_ocr_engine(lang='en')
-        engine_ch = service.get_ocr_engine(lang='ch')
-
-        assert 'en' in service.ocr_engines
-        assert 'ch' in service.ocr_engines
-        assert mock_paddle_ocr.call_count == 2
-
-
-@pytest.mark.unit
-class TestStructureEngineLazyLoading:
-    """Test structure engine lazy loading"""
-
-    @patch('app.services.ocr_service.PPStructureV3')
-    def test_get_structure_engine_creates_new_engine(self, mock_structure):
-        """Test that get_structure_engine creates engine on first call"""
-        mock_engine = Mock()
-        mock_structure.return_value = mock_engine
-
-        service = OCRService()
-        engine = service.get_structure_engine()
-
-        assert engine == mock_engine
-        mock_structure.assert_called_once()
-        assert service.structure_engine == mock_engine
-
-    @patch('app.services.ocr_service.PPStructureV3')
-    def test_get_structure_engine_reuses_existing_engine(self, mock_structure):
-        """Test that get_structure_engine reuses existing engine"""
-        mock_engine = Mock()
-        mock_structure.return_value = mock_engine
-
-        service = OCRService()
-
-        # First call creates engine
-        engine1 = service.get_structure_engine()
-        # Second call should reuse
-        engine2 = service.get_structure_engine()
-
-        assert engine1 == engine2
-        mock_structure.assert_called_once()
-
-
-@pytest.mark.unit
-class TestProcessImageMocked:
-    """Test image processing with mocked OCR engines"""
-
-    @patch('app.services.ocr_service.PaddleOCR')
-    def test_process_image_success(self, mock_paddle_ocr, sample_image_path):
-        """Test successful image processing"""
-        # Mock OCR results - PaddleOCR 3.x format
-        mock_ocr_results = [{
-            'rec_texts': ['Hello World', 'Test Text'],
-            'rec_scores': [0.95, 0.88],
-            'rec_polys': [
-                [[10, 10], [100, 10], [100, 30], [10, 30]],
-                [[10, 40], [100, 40], [100, 60], [10, 60]]
-            ]
-        }]
-
-        mock_engine = Mock()
-        mock_engine.ocr.return_value = mock_ocr_results
-        mock_paddle_ocr.return_value = mock_engine
-
-        service = OCRService()
-        result = service.process_image(sample_image_path, detect_layout=False)
-
-        assert result['status'] == 'success'
-        assert result['file_name'] == sample_image_path.name
-        assert result['language'] == 'ch'
-        assert result['total_text_regions'] == 2
-        assert result['average_confidence'] > 0.8
-        assert len(result['text_regions']) == 2
-        assert 'markdown_content' in result
-        assert 'processing_time' in result
-
-    @patch('app.services.ocr_service.PaddleOCR')
-    def test_process_image_filters_low_confidence(self, mock_paddle_ocr, sample_image_path):
-        """Test that low confidence results are filtered"""
-        # Mock OCR results with varying confidence - PaddleOCR 3.x format
-        mock_ocr_results = [{
-            'rec_texts': ['High Confidence', 'Low Confidence'],
-            'rec_scores': [0.95, 0.50],
-            'rec_polys': [
-                [[10, 10], [100, 10], [100, 30], [10, 30]],
-                [[10, 40], [100, 40], [100, 60], [10, 60]]
-            ]
-        }]
-
-        mock_engine = Mock()
-        mock_engine.ocr.return_value = mock_ocr_results
-        mock_paddle_ocr.return_value = mock_engine
-
-        service = OCRService()
-        result = service.process_image(
-            sample_image_path,
-            detect_layout=False,
-            confidence_threshold=0.80
-        )
-
-        assert result['status'] == 'success'
-        assert result['total_text_regions'] == 1  # Only high confidence
-        assert result['text_regions'][0]['text'] == 'High Confidence'
-
-    @patch('app.services.ocr_service.PaddleOCR')
-    def test_process_image_empty_results(self, mock_paddle_ocr, sample_image_path):
-        """Test processing image with no text detected"""
-        mock_ocr_results = [[]]
-
-        mock_engine = Mock()
-        mock_engine.ocr.return_value = mock_ocr_results
-        mock_paddle_ocr.return_value = mock_engine
-
-        service = OCRService()
-        result = service.process_image(sample_image_path, detect_layout=False)
-
-        assert result['status'] == 'success'
-        assert result['total_text_regions'] == 0
-        assert result['average_confidence'] == 0.0
-
-    @patch('app.services.ocr_service.PaddleOCR')
-    def test_process_image_error_handling(self, mock_paddle_ocr, sample_image_path):
-        """Test error handling during OCR processing"""
-        mock_engine = Mock()
-        mock_engine.ocr.side_effect = Exception("OCR engine error")
-        mock_paddle_ocr.return_value = mock_engine
-
-        service = OCRService()
-        result = service.process_image(sample_image_path, detect_layout=False)
-
-        assert result['status'] == 'error'
-        assert 'error_message' in result
-        assert 'OCR engine error' in result['error_message']
-
-    @patch('app.services.ocr_service.PaddleOCR')
-    def test_process_image_different_languages(self, mock_paddle_ocr, sample_image_path):
-        """Test processing with different languages"""
-        mock_ocr_results = [[
-            [[[10, 10], [100, 10], [100, 30], [10, 30]], ('Text', 0.95)]
-        ]]
-
-        mock_engine = Mock()
-        mock_engine.ocr.return_value = mock_ocr_results
-        mock_paddle_ocr.return_value = mock_engine
-
-        service = OCRService()
-
-        # Test English
-        result_en = service.process_image(sample_image_path, lang='en', detect_layout=False)
-        assert result_en['language'] == 'en'
-
-        # Test Chinese
-        result_ch = service.process_image(sample_image_path, lang='ch', detect_layout=False)
-        assert result_ch['language'] == 'ch'
-
-
-@pytest.mark.unit
-class TestLayoutAnalysisMocked:
-    """Test layout analysis with mocked structure engine"""
-
-    @patch('app.services.ocr_service.PPStructureV3')
-    def test_analyze_layout_success(self, mock_structure, sample_image_path):
-        """Test successful layout analysis"""
-        # Create mock page result with markdown attribute (PP-StructureV3 format)
-        mock_page_result = Mock()
-        mock_page_result.markdown = {
-            'markdown_texts': 'Document Title\n\nParagraph content',
-            'markdown_images': {}
-        }
-
-        # PP-Structure predict() returns a list of page results
-        mock_engine = Mock()
-        mock_engine.predict.return_value = [mock_page_result]
-        mock_structure.return_value = mock_engine
-
-        service = OCRService()
-        layout_data, images_metadata = service.analyze_layout(sample_image_path)
-
-        assert layout_data is not None
-        assert layout_data['total_elements'] == 1
-        assert len(layout_data['elements']) == 1
-        assert layout_data['elements'][0]['type'] == 'text'
-        assert 'Document Title' in layout_data['elements'][0]['content']
-
-    @patch('app.services.ocr_service.PPStructureV3')
-    def test_analyze_layout_with_table(self, mock_structure, sample_image_path):
-        """Test layout analysis with table element"""
-        # Create mock page result with table in markdown (PP-StructureV3 format)
-        mock_page_result = Mock()
-        mock_page_result.markdown = {
-            'markdown_texts': '<table><tr><td>Cell 1</td></tr></table>',
-            'markdown_images': {}
-        }
-
-        # PP-Structure predict() returns a list of page results
-        mock_engine = Mock()
-        mock_engine.predict.return_value = [mock_page_result]
-        mock_structure.return_value = mock_engine
-
-        service = OCRService()
-        layout_data, images_metadata = service.analyze_layout(sample_image_path)
-
-        assert layout_data is not None
-        assert layout_data['elements'][0]['type'] == 'table'
-        # Content should contain the HTML table
-        assert '<table>' in layout_data['elements'][0]['content']
-
-    @patch('app.services.ocr_service.PPStructureV3')
-    def test_analyze_layout_error_handling(self, mock_structure, sample_image_path):
-        """Test error handling in layout analysis"""
-        mock_engine = Mock()
-        mock_engine.side_effect = Exception("Structure analysis error")
-        mock_structure.return_value = mock_engine
-
-        service = OCRService()
-        layout_data, images_metadata = service.analyze_layout(sample_image_path)
-
-        assert layout_data is None
-        assert images_metadata == []
-
-
-@pytest.mark.unit
-class TestMarkdownGeneration:
-    """Test Markdown generation"""
-
-    def test_generate_markdown_from_text_regions(self):
-        """Test Markdown generation from text regions only"""
-        service = OCRService()
-
-        text_regions = [
-            {'text': 'First line', 'bbox': [[10, 10], [100, 10], [100, 30], [10, 30]]},
-            {'text': 'Second line', 'bbox': [[10, 40], [100, 40], [100, 60], [10, 60]]},
-            {'text': 'Third line', 'bbox': [[10, 70], [100, 70], [100, 90], [10, 90]]},
-        ]
-
-        markdown = service.generate_markdown(text_regions)
-
-        assert 'First line' in markdown
-        assert 'Second line' in markdown
-        assert 'Third line' in markdown
-
-    def test_generate_markdown_with_layout(self):
-        """Test Markdown generation with layout information"""
-        service = OCRService()
-
-        text_regions = []
-        layout_data = {
-            'elements': [
-                {'type': 'title', 'content': 'Document Title'},
-                {'type': 'text', 'content': 'Paragraph text'},
-                {'type': 'figure', 'element_id': 0},
-            ]
-        }
-
-        markdown = service.generate_markdown(text_regions, layout_data)
-
-        assert '# Document Title' in markdown
-        assert 'Paragraph text' in markdown
-        assert '![Figure 0]' in markdown
-
-    def test_generate_markdown_with_table(self):
-        """Test Markdown generation with table"""
-        service = OCRService()
-
-        layout_data = {
-            'elements': [
-                {
-                    'type': 'table',
-                    'content': '<table><tr><td>Cell</td></tr></table>'
-                }
-            ]
-        }
-
-        markdown = service.generate_markdown([], layout_data)
-
-        assert '<table>' in markdown
-
-    def test_generate_markdown_empty_input(self):
-        """Test Markdown generation with empty input"""
-        service = OCRService()
-
-        markdown = service.generate_markdown([])
-
-        assert markdown == ""
-
-    def test_generate_markdown_sorts_by_position(self):
-        """Test that text regions are sorted by vertical position"""
-        service = OCRService()
-
-        # Create text regions in reverse order
-        text_regions = [
-            {'text': 'Bottom', 'bbox': [[10, 90], [100, 90], [100, 110], [10, 110]]},
-            {'text': 'Top', 'bbox': [[10, 10], [100, 10], [100, 30], [10, 30]]},
-            {'text': 'Middle', 'bbox': [[10, 50], [100, 50], [100, 70], [10, 70]]},
-        ]
-
-        markdown = service.generate_markdown(text_regions)
-        lines = markdown.strip().split('\n')
-
-        # Should be sorted top to bottom
-        assert lines[0] == 'Top'
-        assert lines[1] == 'Middle'
-        assert lines[2] == 'Bottom'
-
-
-@pytest.mark.unit
-class TestSaveResults:
-    """Test saving OCR results"""
-
-    def test_save_results_success(self, temp_dir):
-        """Test successful saving of results"""
-        service = OCRService()
-
-        result = {
-            'status': 'success',
-            'file_name': 'test.png',
-            'text_regions': [{'text': 'Hello', 'confidence': 0.95}],
-            'markdown_content': '# Hello\n\nTest content',
-        }
-
-        json_path, md_path = service.save_results(result, temp_dir, 'test123')
-
-        assert json_path is not None
-        assert md_path is not None
-        assert json_path.exists()
-        assert md_path.exists()
-
-        # Verify JSON content
-        with open(json_path, 'r') as f:
-            saved_result = json.load(f)
-            assert saved_result['file_name'] == 'test.png'
-
-        # Verify Markdown content
-        md_content = md_path.read_text()
-        assert 'Hello' in md_content
-
-    def test_save_results_creates_directory(self, temp_dir):
-        """Test that save_results creates output directory if needed"""
-        service = OCRService()
-        output_dir = temp_dir / "subdir" / "results"
-
-        result = {
-            'status': 'success',
-            'markdown_content': 'Test',
-        }
-
-        json_path, md_path = service.save_results(result, output_dir, 'test')
-
-        assert output_dir.exists()
-        assert json_path.exists()
-
-    def test_save_results_handles_unicode(self, temp_dir):
-        """Test saving results with Unicode characters"""
-        service = OCRService()
-
-        result = {
-            'status': 'success',
-            'text_regions': [{'text': '你好世界', 'confidence': 0.95}],
-            'markdown_content': '# 你好世界\n\n测试内容',
-        }
-
-        json_path, md_path = service.save_results(result, temp_dir, 'unicode_test')
-
-        # Verify Unicode is preserved
-        with open(json_path, 'r', encoding='utf-8') as f:
-            saved_result = json.load(f)
-            assert saved_result['text_regions'][0]['text'] == '你好世界'
-
-        md_content = md_path.read_text(encoding='utf-8')
-        assert '你好世界' in md_content
-
-
-@pytest.mark.unit
-class TestEdgeCases:
-    """Test edge cases and error handling"""
-
-    @patch('app.services.ocr_service.PaddleOCR')
-    def test_process_image_with_none_results(self, mock_paddle_ocr, sample_image_path):
-        """Test processing when OCR returns None"""
-        mock_engine = Mock()
-        mock_engine.ocr.return_value = None
-        mock_paddle_ocr.return_value = mock_engine
-
-        service = OCRService()
-        result = service.process_image(sample_image_path, detect_layout=False)
-
-        assert result['status'] == 'success'
-        assert result['total_text_regions'] == 0
-
-    @patch('app.services.ocr_service.PaddleOCR')
-    def test_process_image_with_custom_threshold(self, mock_paddle_ocr, sample_image_path):
-        """Test processing with custom confidence threshold"""
-        # PaddleOCR 3.x format
-        mock_ocr_results = [{
-            'rec_texts': ['Text'],
-            'rec_scores': [0.85],
-            'rec_polys': [[[10, 10], [100, 10], [100, 30], [10, 30]]]
-        }]
-
-        mock_engine = Mock()
-        mock_engine.ocr.return_value = mock_ocr_results
-        mock_paddle_ocr.return_value = mock_engine
-
-        service = OCRService()
-
-        # With high threshold - should filter out
-        result_high = service.process_image(
-            sample_image_path,
-            detect_layout=False,
-            confidence_threshold=0.90
-        )
-        assert result_high['total_text_regions'] == 0
-
-        # With low threshold - should include
-        result_low = service.process_image(
-            sample_image_path,
-            detect_layout=False,
-            confidence_threshold=0.80
-        )
-        assert result_low['total_text_regions'] == 1
-
-
-# Integration tests that require actual PaddleOCR models
-@pytest.mark.requires_models
-@pytest.mark.slow
-class TestOCRServiceIntegration:
-    """
-    Integration tests that require actual PaddleOCR models
-    These tests will download models (~900MB) on first run
-    Run with: pytest -m requires_models
-    """
-
-    def test_real_ocr_engine_initialization(self):
-        """Test real PaddleOCR engine initialization"""
-        service = OCRService()
-        engine = service.get_ocr_engine(lang='en')
-
-        assert engine is not None
-        assert hasattr(engine, 'ocr')
-
-    def test_real_structure_engine_initialization(self):
-        """Test real PP-Structure engine initialization"""
-        service = OCRService()
-        engine = service.get_structure_engine()
-
-        assert engine is not None
-
-    def test_real_image_processing(self, sample_image_with_text):
-        """Test processing real image with text"""
-        service = OCRService()
-        result = service.process_image(sample_image_with_text, lang='en')
-
-        assert result['status'] == 'success'
-        assert result['total_text_regions'] > 0
--- a/backend/tests/test_pdf_generator.py
+++ b/backend/tests/test_pdf_generator.py
@@ -1,559 +0,0 @@
-"""
-Tool_OCR - PDF Generator Unit Tests
-Tests for app/services/pdf_generator.py
-"""
-
-import pytest
-from pathlib import Path
-from unittest.mock import Mock, patch, MagicMock
-import subprocess
-
-from app.services.pdf_generator import PDFGenerator, PDFGenerationError
-
-
-@pytest.mark.unit
-class TestPDFGeneratorInit:
-    """Test PDF generator initialization"""
-
-    def test_init(self):
-        """Test PDF generator initialization"""
-        generator = PDFGenerator()
-
-        assert generator is not None
-        assert hasattr(generator, 'css_templates')
-        assert len(generator.css_templates) == 3
-        assert 'default' in generator.css_templates
-        assert 'academic' in generator.css_templates
-        assert 'business' in generator.css_templates
-
-    def test_css_templates_have_content(self):
-        """Test that CSS templates contain content"""
-        generator = PDFGenerator()
-
-        for template_name, css_content in generator.css_templates.items():
-            assert isinstance(css_content, str)
-            assert len(css_content) > 100
-            assert '@page' in css_content
-            assert 'body' in css_content
-
-
-@pytest.mark.unit
-class TestPandocAvailability:
-    """Test Pandoc availability checking"""
-
-    @patch('subprocess.run')
-    def test_check_pandoc_available_success(self, mock_run):
-        """Test Pandoc availability check when pandoc is installed"""
-        mock_run.return_value = Mock(returncode=0, stdout="pandoc 2.x")
-
-        generator = PDFGenerator()
-        is_available = generator.check_pandoc_available()
-
-        assert is_available is True
-        mock_run.assert_called_once()
-        assert mock_run.call_args[0][0] == ["pandoc", "--version"]
-
-    @patch('subprocess.run')
-    def test_check_pandoc_available_not_found(self, mock_run):
-        """Test Pandoc availability check when pandoc is not installed"""
-        mock_run.side_effect = FileNotFoundError()
-
-        generator = PDFGenerator()
-        is_available = generator.check_pandoc_available()
-
-        assert is_available is False
-
-    @patch('subprocess.run')
-    def test_check_pandoc_available_timeout(self, mock_run):
-        """Test Pandoc availability check when command times out"""
-        mock_run.side_effect = subprocess.TimeoutExpired("pandoc", 5)
-
-        generator = PDFGenerator()
-        is_available = generator.check_pandoc_available()
-
-        assert is_available is False
-
-
-@pytest.mark.unit
-class TestPandocPDFGeneration:
-    """Test PDF generation using Pandoc"""
-
-    @pytest.fixture
-    def sample_markdown(self, temp_dir):
-        """Create a sample Markdown file"""
-        md_file = temp_dir / "sample.md"
-        md_file.write_text("# Test Document\n\nThis is a test.", encoding="utf-8")
-        return md_file
-
-    @patch('subprocess.run')
-    def test_generate_pdf_pandoc_success(self, mock_run, sample_markdown, temp_dir):
-        """Test successful PDF generation with Pandoc"""
-        output_path = temp_dir / "output.pdf"
-        mock_run.return_value = Mock(returncode=0, stderr="")
-
-        # Create the output file to simulate successful generation
-        output_path.touch()
-
-        generator = PDFGenerator()
-        result = generator.generate_pdf_pandoc(sample_markdown, output_path)
-
-        assert result == output_path
-        assert output_path.exists()
-        mock_run.assert_called_once()
-
-        # Verify pandoc command structure
-        cmd_args = mock_run.call_args[0][0]
-        assert "pandoc" in cmd_args
-        assert str(sample_markdown) in cmd_args
-        assert str(output_path) in cmd_args
-        assert "--pdf-engine=weasyprint" in cmd_args
-
-    @patch('subprocess.run')
-    def test_generate_pdf_pandoc_with_metadata(self, mock_run, sample_markdown, temp_dir):
-        """Test Pandoc PDF generation with metadata"""
-        output_path = temp_dir / "output.pdf"
-        mock_run.return_value = Mock(returncode=0, stderr="")
-        output_path.touch()
-
-        metadata = {
-            "title": "Test Title",
-            "author": "Test Author",
-            "date": "2025-01-01"
-        }
-
-        generator = PDFGenerator()
-        result = generator.generate_pdf_pandoc(
-            sample_markdown,
-            output_path,
-            metadata=metadata
-        )
-
-        assert result == output_path
-
-        # Verify metadata in command
-        cmd_args = mock_run.call_args[0][0]
-        assert "--metadata" in cmd_args
-        assert "title=Test Title" in cmd_args
-        assert "author=Test Author" in cmd_args
-        assert "date=2025-01-01" in cmd_args
-
-    @patch('subprocess.run')
-    def test_generate_pdf_pandoc_with_custom_css(self, mock_run, sample_markdown, temp_dir):
-        """Test Pandoc PDF generation with custom CSS template"""
-        output_path = temp_dir / "output.pdf"
-        mock_run.return_value = Mock(returncode=0, stderr="")
-        output_path.touch()
-
-        generator = PDFGenerator()
-        result = generator.generate_pdf_pandoc(
-            sample_markdown,
-            output_path,
-            css_template="academic"
-        )
-
-        assert result == output_path
-        mock_run.assert_called_once()
-
-    @patch('subprocess.run')
-    def test_generate_pdf_pandoc_command_failed(self, mock_run, sample_markdown, temp_dir):
-        """Test Pandoc PDF generation when command fails"""
-        output_path = temp_dir / "output.pdf"
-        mock_run.return_value = Mock(returncode=1, stderr="Pandoc error message")
-
-        generator = PDFGenerator()
-
-        with pytest.raises(PDFGenerationError) as exc_info:
-            generator.generate_pdf_pandoc(sample_markdown, output_path)
-
-        assert "Pandoc failed" in str(exc_info.value)
-        assert "Pandoc error message" in str(exc_info.value)
-
-    @patch('subprocess.run')
-    def test_generate_pdf_pandoc_timeout(self, mock_run, sample_markdown, temp_dir):
-        """Test Pandoc PDF generation timeout"""
-        output_path = temp_dir / "output.pdf"
-        mock_run.side_effect = subprocess.TimeoutExpired("pandoc", 60)
-
-        generator = PDFGenerator()
-
-        with pytest.raises(PDFGenerationError) as exc_info:
-            generator.generate_pdf_pandoc(sample_markdown, output_path)
-
-        assert "timed out" in str(exc_info.value).lower()
-
-    @patch('subprocess.run')
-    def test_generate_pdf_pandoc_output_not_created(self, mock_run, sample_markdown, temp_dir):
-        """Test when Pandoc command succeeds but output file not created"""
-        output_path = temp_dir / "output.pdf"
-        mock_run.return_value = Mock(returncode=0, stderr="")
-        # Don't create output file
-
-        generator = PDFGenerator()
-
-        with pytest.raises(PDFGenerationError) as exc_info:
-            generator.generate_pdf_pandoc(sample_markdown, output_path)
-
-        assert "PDF file not created" in str(exc_info.value)
-
-
-@pytest.mark.unit
-class TestWeasyPrintPDFGeneration:
-    """Test PDF generation using WeasyPrint directly"""
-
-    @pytest.fixture
-    def sample_markdown(self, temp_dir):
-        """Create a sample Markdown file"""
-        md_file = temp_dir / "sample.md"
-        md_file.write_text("# Test Document\n\nThis is a test.", encoding="utf-8")
-        return md_file
-
-    @patch('app.services.pdf_generator.HTML')
-    @patch('app.services.pdf_generator.CSS')
-    def test_generate_pdf_weasyprint_success(self, mock_css, mock_html, sample_markdown, temp_dir):
-        """Test successful PDF generation with WeasyPrint"""
-        output_path = temp_dir / "output.pdf"
-
-        # Mock HTML and CSS objects
-        mock_html_instance = Mock()
-        mock_html_instance.write_pdf = Mock()
-        mock_html.return_value = mock_html_instance
-
-        # Create output file to simulate successful generation
-        def create_pdf(*args, **kwargs):
-            output_path.touch()
-
-        mock_html_instance.write_pdf.side_effect = create_pdf
-
-        generator = PDFGenerator()
-        result = generator.generate_pdf_weasyprint(sample_markdown, output_path)
-
-        assert result == output_path
-        assert output_path.exists()
-        mock_html.assert_called_once()
-        mock_css.assert_called_once()
-        mock_html_instance.write_pdf.assert_called_once()
-
-    @patch('app.services.pdf_generator.HTML')
-    @patch('app.services.pdf_generator.CSS')
-    def test_generate_pdf_weasyprint_with_metadata(self, mock_css, mock_html, sample_markdown, temp_dir):
-        """Test WeasyPrint PDF generation with metadata"""
-        output_path = temp_dir / "output.pdf"
-
-        mock_html_instance = Mock()
-        mock_html_instance.write_pdf = Mock()
-        mock_html.return_value = mock_html_instance
-
-        def create_pdf(*args, **kwargs):
-            output_path.touch()
-
-        mock_html_instance.write_pdf.side_effect = create_pdf
-
-        metadata = {
-            "title": "Test Title",
-            "author": "Test Author"
-        }
-
-        generator = PDFGenerator()
-        result = generator.generate_pdf_weasyprint(
-            sample_markdown,
-            output_path,
-            metadata=metadata
-        )
-
-        assert result == output_path
-
-        # Check that HTML string includes title
-        html_call_args = mock_html.call_args
-        assert html_call_args[1]['string'] is not None
-        assert "Test Title" in html_call_args[1]['string']
-
-    @patch('app.services.pdf_generator.HTML')
-    def test_generate_pdf_weasyprint_markdown_conversion(self, mock_html, sample_markdown, temp_dir):
-        """Test that Markdown is properly converted to HTML"""
-        output_path = temp_dir / "output.pdf"
-
-        captured_html = None
-
-        def capture_html(string, **kwargs):
-            nonlocal captured_html
-            captured_html = string
-            mock_instance = Mock()
-            mock_instance.write_pdf = Mock(side_effect=lambda *args, **kwargs: output_path.touch())
-            return mock_instance
-
-        mock_html.side_effect = capture_html
-
-        generator = PDFGenerator()
-        generator.generate_pdf_weasyprint(sample_markdown, output_path)
-
-        # Verify HTML structure
-        assert captured_html is not None
-        assert "<!DOCTYPE html>" in captured_html
-        assert "<h1>Test Document</h1>" in captured_html
-        assert "<p>This is a test.</p>" in captured_html
-
-    @patch('app.services.pdf_generator.HTML')
-    @patch('app.services.pdf_generator.CSS')
-    def test_generate_pdf_weasyprint_with_template(self, mock_css, mock_html, sample_markdown, temp_dir):
-        """Test WeasyPrint PDF generation with different templates"""
-        output_path = temp_dir / "output.pdf"
-
-        mock_html_instance = Mock()
-        mock_html_instance.write_pdf = Mock()
-        mock_html.return_value = mock_html_instance
-
-        def create_pdf(*args, **kwargs):
-            output_path.touch()
-
-        mock_html_instance.write_pdf.side_effect = create_pdf
-
-        generator = PDFGenerator()
-
-        # Test academic template
-        generator.generate_pdf_weasyprint(
-            sample_markdown,
-            output_path,
-            css_template="academic"
-        )
-
-        # Verify CSS was called with academic template content
-        css_call_args = mock_css.call_args
-        assert css_call_args[1]['string'] is not None
-        assert "Times New Roman" in css_call_args[1]['string']
-
-    @patch('app.services.pdf_generator.HTML')
-    def test_generate_pdf_weasyprint_error_handling(self, mock_html, sample_markdown, temp_dir):
-        """Test WeasyPrint error handling"""
-        output_path = temp_dir / "output.pdf"
-
-        mock_html.side_effect = Exception("WeasyPrint rendering error")
-
-        generator = PDFGenerator()
-
-        with pytest.raises(PDFGenerationError) as exc_info:
-            generator.generate_pdf_weasyprint(sample_markdown, output_path)
-
-        assert "WeasyPrint PDF generation failed" in str(exc_info.value)
-
-
-@pytest.mark.unit
-class TestUnifiedPDFGeneration:
-    """Test unified PDF generation with automatic fallback"""
-
-    @pytest.fixture
-    def sample_markdown(self, temp_dir):
-        """Create a sample Markdown file"""
-        md_file = temp_dir / "sample.md"
-        md_file.write_text("# Test Document\n\nTest content.", encoding="utf-8")
-        return md_file
-
-    def test_generate_pdf_nonexistent_markdown(self, temp_dir):
-        """Test error when Markdown file doesn't exist"""
-        nonexistent = temp_dir / "nonexistent.md"
-        output_path = temp_dir / "output.pdf"
-
-        generator = PDFGenerator()
-
-        with pytest.raises(PDFGenerationError) as exc_info:
-            generator.generate_pdf(nonexistent, output_path)
-
-        assert "not found" in str(exc_info.value).lower()
-
-    @patch.object(PDFGenerator, 'check_pandoc_available')
-    @patch.object(PDFGenerator, 'generate_pdf_pandoc')
-    def test_generate_pdf_prefers_pandoc(self, mock_pandoc_gen, mock_check, sample_markdown, temp_dir):
-        """Test that Pandoc is preferred when available"""
-        output_path = temp_dir / "output.pdf"
-        output_path.touch()
-
-        mock_check.return_value = True
-        mock_pandoc_gen.return_value = output_path
-
-        generator = PDFGenerator()
-        result = generator.generate_pdf(sample_markdown, output_path, prefer_pandoc=True)
-
-        assert result == output_path
-        mock_check.assert_called_once()
-        mock_pandoc_gen.assert_called_once()
-
-    @patch.object(PDFGenerator, 'check_pandoc_available')
-    @patch.object(PDFGenerator, 'generate_pdf_weasyprint')
-    def test_generate_pdf_uses_weasyprint_when_pandoc_unavailable(
-        self, mock_weasy_gen, mock_check, sample_markdown, temp_dir
-    ):
-        """Test fallback to WeasyPrint when Pandoc unavailable"""
-        output_path = temp_dir / "output.pdf"
-        output_path.touch()
-
-        mock_check.return_value = False
-        mock_weasy_gen.return_value = output_path
-
-        generator = PDFGenerator()
-        result = generator.generate_pdf(sample_markdown, output_path, prefer_pandoc=True)
-
-        assert result == output_path
-        mock_check.assert_called_once()
-        mock_weasy_gen.assert_called_once()
-
-    @patch.object(PDFGenerator, 'check_pandoc_available')
-    @patch.object(PDFGenerator, 'generate_pdf_pandoc')
-    @patch.object(PDFGenerator, 'generate_pdf_weasyprint')
-    def test_generate_pdf_fallback_on_pandoc_failure(
-        self, mock_weasy_gen, mock_pandoc_gen, mock_check, sample_markdown, temp_dir
-    ):
-        """Test automatic fallback to WeasyPrint when Pandoc fails"""
-        output_path = temp_dir / "output.pdf"
-        output_path.touch()
-
-        mock_check.return_value = True
-        mock_pandoc_gen.side_effect = PDFGenerationError("Pandoc failed")
-        mock_weasy_gen.return_value = output_path
-
-        generator = PDFGenerator()
-        result = generator.generate_pdf(sample_markdown, output_path, prefer_pandoc=True)
-
-        assert result == output_path
-        mock_pandoc_gen.assert_called_once()
-        mock_weasy_gen.assert_called_once()
-
-    @patch.object(PDFGenerator, 'check_pandoc_available')
-    @patch.object(PDFGenerator, 'generate_pdf_weasyprint')
-    def test_generate_pdf_creates_output_directory(
-        self, mock_weasy_gen, mock_check, sample_markdown, temp_dir
-    ):
-        """Test that output directory is created if needed"""
-        output_dir = temp_dir / "subdir" / "outputs"
-        output_path = output_dir / "output.pdf"
-        output_path.parent.mkdir(parents=True, exist_ok=True)
-        output_path.touch()
-
-        mock_check.return_value = False
-        mock_weasy_gen.return_value = output_path
-
-        generator = PDFGenerator()
-        result = generator.generate_pdf(sample_markdown, output_path)
-
-        assert output_dir.exists()
-        assert result == output_path
-
-
-@pytest.mark.unit
-class TestTemplateManagement:
-    """Test CSS template management"""
-
-    def test_get_available_templates(self):
-        """Test retrieving available templates"""
-        generator = PDFGenerator()
-        templates = generator.get_available_templates()
-
-        assert isinstance(templates, dict)
-        assert len(templates) == 3
-        assert "default" in templates
-        assert "academic" in templates
-        assert "business" in templates
-
-        # Check descriptions are in Chinese
-        for desc in templates.values():
-            assert isinstance(desc, str)
-            assert len(desc) > 0
-
-    def test_save_custom_template(self):
-        """Test saving a custom CSS template"""
-        generator = PDFGenerator()
-
-        custom_css = "@page { size: A4; }"
-        generator.save_custom_template("custom", custom_css)
-
-        assert "custom" in generator.css_templates
-        assert generator.css_templates["custom"] == custom_css
-
-    def test_save_custom_template_overwrites_existing(self):
-        """Test that saving custom template can overwrite existing"""
-        generator = PDFGenerator()
-
-        new_css = "@page { size: Letter; }"
-        generator.save_custom_template("default", new_css)
-
-        assert generator.css_templates["default"] == new_css
-
-
-@pytest.mark.unit
-class TestEdgeCases:
-    """Test edge cases and error handling"""
-
-    @pytest.fixture
-    def sample_markdown(self, temp_dir):
-        """Create a sample Markdown file"""
-        md_file = temp_dir / "sample.md"
-        md_file.write_text("# Test", encoding="utf-8")
-        return md_file
-
-    @patch('app.services.pdf_generator.HTML')
-    @patch('app.services.pdf_generator.CSS')
-    def test_generate_with_unicode_content(self, mock_css, mock_html, temp_dir):
-        """Test PDF generation with Unicode/Chinese content"""
-        md_file = temp_dir / "unicode.md"
-        md_file.write_text("# 測試文檔\n\n這是中文內容。", encoding="utf-8")
-        output_path = temp_dir / "output.pdf"
-
-        captured_html = None
-
-        def capture_html(string, **kwargs):
-            nonlocal captured_html
-            captured_html = string
-            mock_instance = Mock()
-            mock_instance.write_pdf = Mock(side_effect=lambda *args, **kwargs: output_path.touch())
-            return mock_instance
-
-        mock_html.side_effect = capture_html
-
-        generator = PDFGenerator()
-        result = generator.generate_pdf_weasyprint(md_file, output_path)
-
-        assert result == output_path
-        assert "測試文檔" in captured_html
-        assert "中文內容" in captured_html
-
-    @patch('app.services.pdf_generator.HTML')
-    @patch('app.services.pdf_generator.CSS')
-    def test_generate_with_table_markdown(self, mock_css, mock_html, temp_dir):
-        """Test PDF generation with Markdown tables"""
-        md_file = temp_dir / "table.md"
-        md_content = """
-# Document with Table
-
-| Column 1 | Column 2 |
-|----------|----------|
-| Data 1   | Data 2   |
-"""
-        md_file.write_text(md_content, encoding="utf-8")
-        output_path = temp_dir / "output.pdf"
-
-        captured_html = None
-
-        def capture_html(string, **kwargs):
-            nonlocal captured_html
-            captured_html = string
-            mock_instance = Mock()
-            mock_instance.write_pdf = Mock(side_effect=lambda *args, **kwargs: output_path.touch())
-            return mock_instance
-
-        mock_html.side_effect = capture_html
-
-        generator = PDFGenerator()
-        result = generator.generate_pdf_weasyprint(md_file, output_path)
-
-        assert result == output_path
-        # Markdown tables should be converted to HTML tables
-        assert "<table>" in captured_html
-        assert "<th>" in captured_html or "<td>" in captured_html
-
-    def test_custom_css_string_not_in_templates(self, sample_markdown, temp_dir):
-        """Test using custom CSS string that's not a template name"""
-        generator = PDFGenerator()
-
-        # This should work - treat as custom CSS string
-        custom_css = "body { font-size: 20pt; }"
-
-        # When CSS template is not in templates dict, it should be used as-is
-        assert custom_css not in generator.css_templates.values()
--- a/backend/tests/test_preprocessor.py
+++ b/backend/tests/test_preprocessor.py
@@ -1,350 +0,0 @@
-"""
-Tool_OCR - Document Preprocessor Unit Tests
-Tests for app/services/preprocessor.py
-"""
-
-import pytest
-from pathlib import Path
-from PIL import Image
-
-from app.services.preprocessor import DocumentPreprocessor
-
-
-@pytest.mark.unit
-class TestDocumentPreprocessor:
-    """Test suite for DocumentPreprocessor"""
-
-    def test_init(self, preprocessor):
-        """Test preprocessor initialization"""
-        assert preprocessor is not None
-        assert preprocessor.max_file_size > 0
-        assert len(preprocessor.allowed_extensions) > 0
-        assert 'png' in preprocessor.allowed_extensions
-        assert 'jpg' in preprocessor.allowed_extensions
-        assert 'pdf' in preprocessor.allowed_extensions
-
-    def test_supported_formats(self, preprocessor):
-        """Test that all expected formats are supported"""
-        expected_image_formats = ['png', 'jpg', 'jpeg', 'bmp', 'tiff', 'tif']
-        expected_pdf_format = ['pdf']
-
-        for fmt in expected_image_formats:
-            assert fmt in preprocessor.SUPPORTED_IMAGE_FORMATS
-
-        for fmt in expected_pdf_format:
-            assert fmt in preprocessor.SUPPORTED_PDF_FORMAT
-
-        all_formats = expected_image_formats + expected_pdf_format
-        assert set(preprocessor.ALL_SUPPORTED_FORMATS) == set(all_formats)
-
-
-@pytest.mark.unit
-class TestFileValidation:
-    """Test file validation methods"""
-
-    def test_validate_valid_png(self, preprocessor, sample_image_path):
-        """Test validation of a valid PNG file"""
-        is_valid, file_format, error = preprocessor.validate_file(sample_image_path)
-
-        assert is_valid is True
-        assert file_format == 'png'
-        assert error is None
-
-    def test_validate_valid_jpg(self, preprocessor, sample_jpg_path):
-        """Test validation of a valid JPG file"""
-        is_valid, file_format, error = preprocessor.validate_file(sample_jpg_path)
-
-        assert is_valid is True
-        assert file_format == 'jpg'
-        assert error is None
-
-    def test_validate_valid_pdf(self, preprocessor, sample_pdf_path):
-        """Test validation of a valid PDF file"""
-        is_valid, file_format, error = preprocessor.validate_file(sample_pdf_path)
-
-        assert is_valid is True
-        assert file_format == 'pdf'
-        assert error is None
-
-    def test_validate_nonexistent_file(self, preprocessor, temp_dir):
-        """Test validation of a non-existent file"""
-        fake_path = temp_dir / "nonexistent.png"
-        is_valid, file_format, error = preprocessor.validate_file(fake_path)
-
-        assert is_valid is False
-        assert file_format is None
-        assert "not found" in error.lower()
-
-    def test_validate_large_file(self, preprocessor, large_file_path):
-        """Test validation of a file exceeding size limit"""
-        is_valid, file_format, error = preprocessor.validate_file(large_file_path)
-
-        assert is_valid is False
-        assert file_format is None
-        assert "too large" in error.lower()
-
-    def test_validate_unsupported_format(self, preprocessor, unsupported_file_path):
-        """Test validation of unsupported file format"""
-        is_valid, file_format, error = preprocessor.validate_file(unsupported_file_path)
-
-        assert is_valid is False
-        assert "not allowed" in error.lower() or "unsupported" in error.lower()
-
-    def test_validate_corrupted_image(self, preprocessor, corrupted_image_path):
-        """Test validation of a corrupted image file"""
-        is_valid, file_format, error = preprocessor.validate_file(corrupted_image_path)
-
-        assert is_valid is False
-        assert error is not None
-        # Corrupted files may be detected as unsupported type or corrupted
-        assert ("corrupted" in error.lower() or
-                "unsupported" in error.lower() or
-                "not allowed" in error.lower())
-
-
-@pytest.mark.unit
-class TestMimeTypeMapping:
-    """Test MIME type to format mapping"""
-
-    def test_mime_to_format_png(self, preprocessor):
-        """Test PNG MIME type mapping"""
-        assert preprocessor._mime_to_format('image/png') == 'png'
-
-    def test_mime_to_format_jpeg(self, preprocessor):
-        """Test JPEG MIME type mapping"""
-        assert preprocessor._mime_to_format('image/jpeg') == 'jpg'
-        assert preprocessor._mime_to_format('image/jpg') == 'jpg'
-
-    def test_mime_to_format_pdf(self, preprocessor):
-        """Test PDF MIME type mapping"""
-        assert preprocessor._mime_to_format('application/pdf') == 'pdf'
-
-    def test_mime_to_format_tiff(self, preprocessor):
-        """Test TIFF MIME type mapping"""
-        assert preprocessor._mime_to_format('image/tiff') == 'tiff'
-        assert preprocessor._mime_to_format('image/x-tiff') == 'tiff'
-
-    def test_mime_to_format_bmp(self, preprocessor):
-        """Test BMP MIME type mapping"""
-        assert preprocessor._mime_to_format('image/bmp') == 'bmp'
-
-    def test_mime_to_format_unknown(self, preprocessor):
-        """Test unknown MIME type returns None"""
-        assert preprocessor._mime_to_format('unknown/type') is None
-        assert preprocessor._mime_to_format('text/plain') is None
-
-
-@pytest.mark.unit
-class TestIntegrityValidation:
-    """Test file integrity validation"""
-
-    def test_validate_integrity_valid_png(self, preprocessor, sample_image_path):
-        """Test integrity check for valid PNG"""
-        is_valid, error = preprocessor._validate_integrity(sample_image_path, 'png')
-
-        assert is_valid is True
-        assert error is None
-
-    def test_validate_integrity_valid_jpg(self, preprocessor, sample_jpg_path):
-        """Test integrity check for valid JPG"""
-        is_valid, error = preprocessor._validate_integrity(sample_jpg_path, 'jpg')
-
-        assert is_valid is True
-        assert error is None
-
-    def test_validate_integrity_valid_pdf(self, preprocessor, sample_pdf_path):
-        """Test integrity check for valid PDF"""
-        is_valid, error = preprocessor._validate_integrity(sample_pdf_path, 'pdf')
-
-        assert is_valid is True
-        assert error is None
-
-    def test_validate_integrity_corrupted_image(self, preprocessor, corrupted_image_path):
-        """Test integrity check for corrupted image"""
-        is_valid, error = preprocessor._validate_integrity(corrupted_image_path, 'png')
-
-        assert is_valid is False
-        assert error is not None
-
-    def test_validate_integrity_invalid_pdf_header(self, preprocessor, temp_dir):
-        """Test integrity check for PDF with invalid header"""
-        invalid_pdf = temp_dir / "invalid.pdf"
-        with open(invalid_pdf, 'wb') as f:
-            f.write(b'Not a PDF file')
-
-        is_valid, error = preprocessor._validate_integrity(invalid_pdf, 'pdf')
-
-        assert is_valid is False
-        assert "invalid" in error.lower() or "header" in error.lower()
-
-    def test_validate_integrity_unknown_format(self, preprocessor, temp_dir):
-        """Test integrity check for unknown format"""
-        test_file = temp_dir / "test.xyz"
-        test_file.write_text("test")
-
-        is_valid, error = preprocessor._validate_integrity(test_file, 'xyz')
-
-        assert is_valid is False
-        assert error is not None
-
-
-@pytest.mark.unit
-class TestImagePreprocessing:
-    """Test image preprocessing functionality"""
-
-    def test_preprocess_image_without_enhancement(self, preprocessor, sample_image_path):
-        """Test preprocessing without enhancement (returns original)"""
-        success, output_path, error = preprocessor.preprocess_image(
-            sample_image_path,
-            enhance=False
-        )
-
-        assert success is True
-        assert output_path == sample_image_path
-        assert error is None
-
-    def test_preprocess_image_with_enhancement(self, preprocessor, sample_image_with_text, temp_dir):
-        """Test preprocessing with enhancement"""
-        output_path = temp_dir / "processed.png"
-
-        success, result_path, error = preprocessor.preprocess_image(
-            sample_image_with_text,
-            enhance=True,
-            output_path=output_path
-        )
-
-        assert success is True
-        assert result_path == output_path
-        assert result_path.exists()
-        assert error is None
-
-        # Verify the output is a valid image
-        with Image.open(result_path) as img:
-            assert img.size[0] > 0
-            assert img.size[1] > 0
-
-    def test_preprocess_image_auto_output_path(self, preprocessor, sample_image_with_text):
-        """Test preprocessing with automatic output path"""
-        success, result_path, error = preprocessor.preprocess_image(
-            sample_image_with_text,
-            enhance=True
-        )
-
-        assert success is True
-        assert result_path is not None
-        assert result_path.exists()
-        assert "processed_" in result_path.name
-        assert error is None
-
-    def test_preprocess_nonexistent_image(self, preprocessor, temp_dir):
-        """Test preprocessing with non-existent image"""
-        fake_path = temp_dir / "nonexistent.png"
-
-        success, result_path, error = preprocessor.preprocess_image(
-            fake_path,
-            enhance=True
-        )
-
-        assert success is False
-        assert result_path is None
-        assert error is not None
-
-    def test_preprocess_corrupted_image(self, preprocessor, corrupted_image_path):
-        """Test preprocessing with corrupted image"""
-        success, result_path, error = preprocessor.preprocess_image(
-            corrupted_image_path,
-            enhance=True
-        )
-
-        assert success is False
-        assert result_path is None
-        assert error is not None
-
-
-@pytest.mark.unit
-class TestFileInfo:
-    """Test file information retrieval"""
-
-    def test_get_file_info_png(self, preprocessor, sample_image_path):
-        """Test getting file info for PNG"""
-        info = preprocessor.get_file_info(sample_image_path)
-
-        assert info['name'] == sample_image_path.name
-        assert info['path'] == str(sample_image_path)
-        assert info['size'] > 0
-        assert info['size_mb'] > 0
-        assert info['mime_type'] == 'image/png'
-        assert info['format'] == 'png'
-        assert 'created_at' in info
-        assert 'modified_at' in info
-
-    def test_get_file_info_jpg(self, preprocessor, sample_jpg_path):
-        """Test getting file info for JPG"""
-        info = preprocessor.get_file_info(sample_jpg_path)
-
-        assert info['name'] == sample_jpg_path.name
-        assert info['mime_type'] == 'image/jpeg'
-        assert info['format'] == 'jpg'
-
-    def test_get_file_info_pdf(self, preprocessor, sample_pdf_path):
-        """Test getting file info for PDF"""
-        info = preprocessor.get_file_info(sample_pdf_path)
-
-        assert info['name'] == sample_pdf_path.name
-        assert info['mime_type'] == 'application/pdf'
-        assert info['format'] == 'pdf'
-
-    def test_get_file_info_size_calculation(self, preprocessor, sample_image_path):
-        """Test that file size is correctly calculated"""
-        info = preprocessor.get_file_info(sample_image_path)
-
-        actual_size = sample_image_path.stat().st_size
-        assert info['size'] == actual_size
-        assert abs(info['size_mb'] - (actual_size / (1024 * 1024))) < 0.001
-
-
-@pytest.mark.unit
-class TestEdgeCases:
-    """Test edge cases and error handling"""
-
-    def test_validate_empty_file(self, preprocessor, temp_dir):
-        """Test validation of empty file"""
-        empty_file = temp_dir / "empty.png"
-        empty_file.touch()
-
-        is_valid, file_format, error = preprocessor.validate_file(empty_file)
-
-        # Should fail because empty file has no valid MIME type or is corrupted
-        assert is_valid is False
-
-    def test_validate_file_with_wrong_extension(self, preprocessor, temp_dir):
-        """Test validation of file with misleading extension"""
-        # Create a PNG file but name it .txt
-        misleading_file = temp_dir / "image.txt"
-        img = Image.new('RGB', (10, 10), color='white')
-        img.save(misleading_file, 'PNG')
-
-        # Validation uses MIME detection, not extension
-        # So a PNG file named .txt should pass if PNG is in allowed_extensions
-        is_valid, file_format, error = preprocessor.validate_file(misleading_file)
-
-        # Should succeed because MIME detection finds it's a PNG
-        # (preprocessor uses magic number detection, not file extension)
-        assert is_valid is True
-        assert file_format == 'png'
-
-    def test_preprocess_very_small_image(self, preprocessor, temp_dir):
-        """Test preprocessing of very small image"""
-        small_image = temp_dir / "small.png"
-        img = Image.new('RGB', (5, 5), color='white')
-        img.save(small_image, 'PNG')
-
-        success, result_path, error = preprocessor.preprocess_image(
-            small_image,
-            enhance=True
-        )
-
-        # Should succeed even with very small image
-        assert success is True
-        assert result_path is not None
-        assert result_path.exists()
--- a/backend/tests/test_tasks.py
+++ b/backend/tests/test_tasks.py
@@ -1,106 +0,0 @@
-"""
-Unit tests for task management endpoints
-"""
-
-import pytest
-from app.models.task import Task
-
-
-class TestTasks:
-    """Test task management endpoints"""
-
-    def test_create_task(self, client, auth_token):
-        """Test task creation"""
-        response = client.post(
-            '/api/v2/tasks/',
-            headers={'Authorization': f'Bearer {auth_token}'},
-            json={
-                'filename': 'test.pdf',
-                'file_type': 'application/pdf'
-            }
-        )
-
-        assert response.status_code == 201
-        data = response.json()
-        assert 'task_id' in data
-        assert data['filename'] == 'test.pdf'
-        assert data['status'] == 'pending'
-
-    def test_list_tasks(self, client, auth_token, test_task):
-        """Test listing user tasks"""
-        response = client.get(
-            '/api/v2/tasks/',
-            headers={'Authorization': f'Bearer {auth_token}'}
-        )
-
-        assert response.status_code == 200
-        data = response.json()
-        assert 'tasks' in data
-        assert 'total' in data
-        assert len(data['tasks']) > 0
-
-    def test_get_task(self, client, auth_token, test_task):
-        """Test get single task"""
-        response = client.get(
-            f'/api/v2/tasks/{test_task.task_id}',
-            headers={'Authorization': f'Bearer {auth_token}'}
-        )
-
-        assert response.status_code == 200
-        data = response.json()
-        assert data['task_id'] == test_task.task_id
-
-    def test_get_task_stats(self, client, auth_token, test_task):
-        """Test get task statistics"""
-        response = client.get(
-            '/api/v2/tasks/stats',
-            headers={'Authorization': f'Bearer {auth_token}'}
-        )
-
-        assert response.status_code == 200
-        data = response.json()
-        assert 'total' in data
-        assert 'pending' in data
-        assert 'processing' in data
-        assert 'completed' in data
-        assert 'failed' in data
-
-    def test_delete_task(self, client, auth_token, test_task):
-        """Test task deletion"""
-        response = client.delete(
-            f'/api/v2/tasks/{test_task.task_id}',
-            headers={'Authorization': f'Bearer {auth_token}'}
-        )
-
-        # DELETE should return 204 No Content (standard for successful deletion)
-        assert response.status_code == 204
-
-    def test_user_isolation(self, client, db, test_user):
-        """Test that users can only access their own tasks"""
-        # Create another user
-        from app.models.user import User
-        other_user = User(email="other@example.com", display_name="Other User")
-        db.add(other_user)
-        db.commit()
-
-        # Create task for other user
-        other_task = Task(
-            user_id=other_user.id,
-            task_id="other-task-123",
-            filename="other.pdf",
-            status="pending"
-        )
-        db.add(other_task)
-        db.commit()
-
-        # Create token for test_user
-        from app.core.security import create_access_token
-        token = create_access_token({"sub": str(test_user.id)})
-
-        # Try to access other user's task
-        response = client.get(
-            f'/api/v2/tasks/{other_task.task_id}',
-            headers={'Authorization': f'Bearer {token}'}
-        )
-
-        assert response.status_code == 404  # Task not found (user isolation)
--- a/demo_docs/office_tests/create_docx.py
+++ b/demo_docs/office_tests/create_docx.py
@@ -1,100 +0,0 @@
-#!/usr/bin/env python3
-import zipfile
-from pathlib import Path
-
-# Create a minimal DOCX file
-output_path = Path('/Users/egg/Projects/Tool_OCR/demo_docs/office_tests/test_document.docx')
-
-# DOCX is a ZIP file containing XML files
-with zipfile.ZipFile(output_path, 'w', zipfile.ZIP_DEFLATED) as docx:
-    # [Content_Types].xml
-    content_types = '''<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
-<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">
-    <Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
-    <Default Extension="xml" ContentType="application/xml"/>
-    <Override PartName="/word/document.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml"/>
-</Types>'''
-    docx.writestr('[Content_Types].xml', content_types)
-
-    # _rels/.rels
-    rels = '''<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
-<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
-    <Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument" Target="word/document.xml"/>
-</Relationships>'''
-    docx.writestr('_rels/.rels', rels)
-
-    # word/document.xml with Chinese and English content
-    document = '''<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
-<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
-    <w:body>
-        <w:p>
-            <w:pPr><w:pStyle w:val="Heading1"/></w:pPr>
-            <w:r><w:t>Office Document OCR Test</w:t></w:r>
-        </w:p>
-        <w:p>
-            <w:pPr><w:pStyle w:val="Heading2"/></w:pPr>
-            <w:r><w:t>測試文件說明</w:t></w:r>
-        </w:p>
-        <w:p>
-            <w:r><w:t>這是一個用於測試 Tool_OCR 系統 Office 文件支援功能的測試文件。</w:t></w:r>
-        </w:p>
-        <w:p>
-            <w:r><w:t>本系統現已支援以下 Office 格式：</w:t></w:r>
-        </w:p>
-        <w:p>
-            <w:r><w:t>• Microsoft Word: DOC, DOCX</w:t></w:r>
-        </w:p>
-        <w:p>
-            <w:r><w:t>• Microsoft PowerPoint: PPT, PPTX</w:t></w:r>
-        </w:p>
-        <w:p>
-            <w:pPr><w:pStyle w:val="Heading2"/></w:pPr>
-            <w:r><w:t>處理流程</w:t></w:r>
-        </w:p>
-        <w:p>
-            <w:r><w:t>Office 文件的處理流程如下：</w:t></w:r>
-        </w:p>
-        <w:p>
-            <w:r><w:t>1. 使用 LibreOffice 將 Office 文件轉換為 PDF</w:t></w:r>
-        </w:p>
-        <w:p>
-            <w:r><w:t>2. 將 PDF 轉換為圖片（每頁一張）</w:t></w:r>
-        </w:p>
-        <w:p>
-            <w:r><w:t>3. 使用 PaddleOCR 處理每張圖片</w:t></w:r>
-        </w:p>
-        <w:p>
-            <w:r><w:t>4. 合併所有頁面的 OCR 結果</w:t></w:r>
-        </w:p>
-        <w:p>
-            <w:pPr><w:pStyle w:val="Heading2"/></w:pPr>
-            <w:r><w:t>中英混合測試</w:t></w:r>
-        </w:p>
-        <w:p>
-            <w:r><w:t>This is a test for mixed Chinese and English OCR recognition.</w:t></w:r>
-        </w:p>
-        <w:p>
-            <w:r><w:t>測試中英文混合識別能力：1234567890</w:t></w:r>
-        </w:p>
-        <w:p>
-            <w:pPr><w:pStyle w:val="Heading2"/></w:pPr>
-            <w:r><w:t>Technical Information</w:t></w:r>
-        </w:p>
-        <w:p>
-            <w:r><w:t>System Version: Tool_OCR v1.0</w:t></w:r>
-        </w:p>
-        <w:p>
-            <w:r><w:t>Conversion Engine: LibreOffice Headless</w:t></w:r>
-        </w:p>
-        <w:p>
-            <w:r><w:t>OCR Engine: PaddleOCR</w:t></w:r>
-        </w:p>
-        <w:p>
-            <w:r><w:t>Token Validity: 24 hours (1440 minutes)</w:t></w:r>
-        </w:p>
-    </w:body>
-</w:document>'''
-    docx.writestr('word/document.xml', document)
-
-print(f"Created DOCX file: {output_path}")
-print(f"File size: {output_path.stat().st_size} bytes")
--- a/demo_docs/office_tests/test_document.docx
+++ b/demo_docs/office_tests/test_document.docx
--- a/demo_docs/office_tests/test_document.html
+++ b/demo_docs/office_tests/test_document.html
@@ -1,64 +0,0 @@
-<!DOCTYPE html>
-<html>
-<head>
-    <meta charset="UTF-8">
-    <title>Office Document OCR Test</title>
-</head>
-<body>
-    <h1>Office Document OCR Test</h1>
-
-    <h2>測試文件說明</h2>
-    <p>這是一個用於測試 Tool_OCR 系統 Office 文件支援功能的測試文件。</p>
-    <p>本系統現已支援以下 Office 格式：</p>
-    <ul>
-        <li>Microsoft Word: DOC, DOCX</li>
-        <li>Microsoft PowerPoint: PPT, PPTX</li>
-    </ul>
-
-    <h2>處理流程</h2>
-    <p>Office 文件的處理流程如下：</p>
-    <ol>
-        <li>使用 LibreOffice 將 Office 文件轉換為 PDF</li>
-        <li>將 PDF 轉換為圖片（每頁一張）</li>
-        <li>使用 PaddleOCR 處理每張圖片</li>
-        <li>合併所有頁面的 OCR 結果</li>
-    </ol>
-
-    <h2>測試數據表格</h2>
-    <table border="1" cellpadding="5">
-        <tr>
-            <th>格式</th>
-            <th>副檔名</th>
-            <th>支援狀態</th>
-        </tr>
-        <tr>
-            <td>Word 新版</td>
-            <td>.docx</td>
-            <td>✓ 支援</td>
-        </tr>
-        <tr>
-            <td>Word 舊版</td>
-            <td>.doc</td>
-            <td>✓ 支援</td>
-        </tr>
-        <tr>
-            <td>PowerPoint 新版</td>
-            <td>.pptx</td>
-            <td>✓ 支援</td>
-        </tr>
-        <tr>
-            <td>PowerPoint 舊版</td>
-            <td>.ppt</td>
-            <td>✓ 支援</td>
-        </tr>
-    </table>
-
-    <h2>中英混合測試</h2>
-    <p>This is a test for mixed Chinese and English OCR recognition.</p>
-    <p>測試中英文混合識別能力：1234567890</p>
-
-    <h2>特殊字符測試</h2>
-    <p>符號測試：!@#$%^&*()_+-=[]{}|;:',.<>?/</p>
-    <p>數學符號：± × ÷ √ ∞ ≈ ≠ ≤ ≥</p>
-</body>
-</html>
--- a/demo_docs/office_tests/test_office_upload.py
+++ b/demo_docs/office_tests/test_office_upload.py
@@ -1,178 +0,0 @@
-#!/usr/bin/env python3
-"""
-Test script for Office document processing
-"""
-import json
-import requests
-from pathlib import Path
-import time
-
-API_BASE = "http://localhost:12010/api/v1"
-USERNAME = "admin"
-PASSWORD = "admin123"
-
-def login():
-    """Login and get JWT token"""
-    print("Step 1: Logging in...")
-    response = requests.post(
-        f"{API_BASE}/auth/login",
-        json={"username": USERNAME, "password": PASSWORD}
-    )
-    response.raise_for_status()
-
-    data = response.json()
-    token = data["access_token"]
-    print(f"✓ Login successful. Token expires in: {data['expires_in']} seconds ({data['expires_in']//3600} hours)")
-    return token
-
-def upload_file(token, file_path):
-    """Upload file and create batch"""
-    print(f"\nStep 2: Uploading file: {file_path.name}...")
-    with open(file_path, 'rb') as f:
-        files = {'files': (file_path.name, f, 'application/vnd.openxmlformats-officedocument.wordprocessingml.document')}
-        response = requests.post(
-            f"{API_BASE}/upload",
-            headers={"Authorization": f"Bearer {token}"},
-            files=files,
-            data={"batch_name": "Office Document Test"}
-        )
-    response.raise_for_status()
-    result = response.json()
-    print(f"✓ File uploaded and batch created:")
-    print(f"  Batch ID: {result['id']}")
-    print(f"  Total files: {result['total_files']}")
-    print(f"  Status: {result['status']}")
-    return result['id']
-
-def trigger_ocr(token, batch_id):
-    """Trigger OCR processing"""
-    print(f"\nStep 3: Triggering OCR processing...")
-    response = requests.post(
-        f"{API_BASE}/ocr/process",
-        headers={"Authorization": f"Bearer {token}"},
-        json={
-            "batch_id": batch_id,
-            "lang": "ch",
-            "detect_layout": True
-        }
-    )
-    response.raise_for_status()
-    result = response.json()
-    print(f"✓ OCR processing started")
-    print(f"  Message: {result['message']}")
-    print(f"  Total files: {result['total_files']}")
-
-def check_status(token, batch_id):
-    """Check processing status"""
-    print(f"\nStep 4: Checking processing status...")
-    max_wait = 120  # 120 seconds max
-    waited = 0
-
-    while waited < max_wait:
-        response = requests.get(
-            f"{API_BASE}/batch/{batch_id}/status",
-            headers={"Authorization": f"Bearer {token}"}
-        )
-        response.raise_for_status()
-        data = response.json()
-
-        batch_status = data['batch']['status']
-        progress = data['batch']['progress_percentage']
-        file_status = data['files'][0]['status']
-
-        print(f"  Batch status: {batch_status}, Progress: {progress}%, File status: {file_status}")
-
-        if batch_status == 'completed':
-            print(f"\n✓ Processing completed!")
-            file_data = data['files'][0]
-            if 'processing_time' in file_data:
-                print(f"  Processing time: {file_data['processing_time']:.2f} seconds")
-            return data
-        elif batch_status == 'failed':
-            print(f"\n✗ Processing failed!")
-            print(f"  Error: {data['files'][0].get('error_message', 'Unknown error')}")
-            return data
-
-        time.sleep(5)
-        waited += 5
-
-    print(f"\n⚠ Timeout waiting for processing (waited {waited}s)")
-    return None
-
-def get_result(token, file_id):
-    """Get OCR result"""
-    print(f"\nStep 5: Getting OCR result...")
-    response = requests.get(
-        f"{API_BASE}/ocr/result/{file_id}",
-        headers={"Authorization": f"Bearer {token}"}
-    )
-    response.raise_for_status()
-    data = response.json()
-
-    file_info = data['file']
-    result = data.get('result')
-
-    print(f"✓ OCR Result retrieved:")
-    print(f"  File: {file_info['original_filename']}")
-    print(f"  Status: {file_info['status']}")
-
-    if result:
-        print(f"  Language: {result.get('detected_language', 'N/A')}")
-        print(f"  Total text regions: {result.get('total_text_regions', 0)}")
-        print(f"  Average confidence: {result.get('average_confidence', 0):.2%}")
-
-        # Read markdown file if available
-        if result.get('markdown_path'):
-            try:
-                with open(result['markdown_path'], 'r', encoding='utf-8') as f:
-                    markdown_content = f.read()
-                print(f"\n  Markdown preview (first 300 chars):")
-                print(f"  {'-'*60}")
-                print(f"  {markdown_content[:300]}...")
-                print(f"  {'-'*60}")
-            except Exception as e:
-                print(f"  Could not read markdown file: {e}")
-    else:
-        print(f"  No OCR result available yet")
-
-    return data
-
-def main():
-    try:
-        # Test file
-        test_file = Path('/Users/egg/Projects/Tool_OCR/demo_docs/office_tests/test_document.docx')
-
-        if not test_file.exists():
-            print(f"✗ Test file not found: {test_file}")
-            return
-
-        print("="*70)
-        print("Office Document Processing Test")
-        print("="*70)
-        print(f"Test file: {test_file.name} ({test_file.stat().st_size} bytes)")
-        print("="*70)
-
-        # Run test
-        token = login()
-        batch_id = upload_file(token, test_file)
-        trigger_ocr(token, batch_id)
-        status_data = check_status(token, batch_id)
-
-        if status_data and status_data['batch']['status'] == 'completed':
-            file_id = status_data['files'][0]['id']
-            result = get_result(token, file_id)
-            print("\n" + "="*70)
-            print("✓ TEST PASSED: Office document processing successful!")
-            print("="*70)
-        else:
-            print("\n" + "="*70)
-            print("✗ TEST FAILED: Processing did not complete successfully")
-            print("="*70)
-
-    except Exception as e:
-        print(f"\n✗ TEST ERROR: {str(e)}")
-        import traceback
-        traceback.print_exc()
-
-if __name__ == "__main__":
-    main()
--- a/openspec/changes/archive/2025-11-18-add-gpu-acceleration-support/proposal.md
+++ b/openspec/changes/archive/2025-11-18-add-gpu-acceleration-support/proposal.md
--- a/openspec/changes/archive/2025-11-18-add-gpu-acceleration-support/specs/environment-setup/spec.md
+++ b/openspec/changes/archive/2025-11-18-add-gpu-acceleration-support/specs/environment-setup/spec.md
--- a/openspec/changes/archive/2025-11-18-add-gpu-acceleration-support/specs/ocr-processing/spec.md
+++ b/openspec/changes/archive/2025-11-18-add-gpu-acceleration-support/specs/ocr-processing/spec.md
--- a/openspec/changes/archive/2025-11-18-add-gpu-acceleration-support/tasks.md
+++ b/openspec/changes/archive/2025-11-18-add-gpu-acceleration-support/tasks.md
--- a/openspec/changes/archive/2025-11-18-add-ocr-batch-processing/OFFICE_INTEGRATION.md
+++ b/openspec/changes/archive/2025-11-18-add-ocr-batch-processing/OFFICE_INTEGRATION.md
--- a/openspec/changes/archive/2025-11-18-add-ocr-batch-processing/SESSION_SUMMARY.md
+++ b/openspec/changes/archive/2025-11-18-add-ocr-batch-processing/SESSION_SUMMARY.md
--- a/openspec/changes/archive/2025-11-18-add-ocr-batch-processing/STATUS.md
+++ b/openspec/changes/archive/2025-11-18-add-ocr-batch-processing/STATUS.md
--- a/openspec/changes/archive/2025-11-18-add-ocr-batch-processing/design.md
+++ b/openspec/changes/archive/2025-11-18-add-ocr-batch-processing/design.md
--- a/openspec/changes/archive/2025-11-18-add-ocr-batch-processing/proposal.md
+++ b/openspec/changes/archive/2025-11-18-add-ocr-batch-processing/proposal.md
--- a/openspec/changes/archive/2025-11-18-add-ocr-batch-processing/specs/export-results/spec.md
+++ b/openspec/changes/archive/2025-11-18-add-ocr-batch-processing/specs/export-results/spec.md
--- a/openspec/changes/archive/2025-11-18-add-ocr-batch-processing/specs/file-management/spec.md
+++ b/openspec/changes/archive/2025-11-18-add-ocr-batch-processing/specs/file-management/spec.md
--- a/openspec/changes/archive/2025-11-18-add-ocr-batch-processing/specs/ocr-processing/spec.md
+++ b/openspec/changes/archive/2025-11-18-add-ocr-batch-processing/specs/ocr-processing/spec.md
--- a/openspec/changes/archive/2025-11-18-add-ocr-batch-processing/tasks.md
+++ b/openspec/changes/archive/2025-11-18-add-ocr-batch-processing/tasks.md
--- a/openspec/changes/archive/2025-11-18-add-office-document-support/IMPLEMENTATION.md
+++ b/openspec/changes/archive/2025-11-18-add-office-document-support/IMPLEMENTATION.md
--- a/openspec/changes/archive/2025-11-18-add-office-document-support/design.md
+++ b/openspec/changes/archive/2025-11-18-add-office-document-support/design.md
--- a/openspec/changes/archive/2025-11-18-add-office-document-support/proposal.md
+++ b/openspec/changes/archive/2025-11-18-add-office-document-support/proposal.md
--- a/openspec/changes/archive/2025-11-18-add-office-document-support/specs/file-processing/spec.md
+++ b/openspec/changes/archive/2025-11-18-add-office-document-support/specs/file-processing/spec.md
--- a/openspec/changes/archive/2025-11-18-add-office-document-support/tasks.md
+++ b/openspec/changes/archive/2025-11-18-add-office-document-support/tasks.md
--- a/openspec/changes/archive/2025-11-18-fix-result-preview-and-pdf-download/ARCHITECTURE-REFACTOR-PLAN.md
+++ b/openspec/changes/archive/2025-11-18-fix-result-preview-and-pdf-download/ARCHITECTURE-REFACTOR-PLAN.md
@@ -0,0 +1,817 @@
+# Tool_OCR 架構大改方案
+## 基於 PaddleOCR PP-StructureV3 完整能力的重構計劃
+
+**規劃日期**: 2025-01-18
+**硬體配置**: RTX 4060 8GB VRAM
+**優先級**: P0 (最高)
+
+---
+
+## 📊 現狀分析
+
+### 目前架構的問題
+
+#### 1. **PP-StructureV3 能力嚴重浪費**
+```python
+# ❌ 目前實作 (ocr_service.py:614-646)
+markdown_dict = page_result.markdown  # 只用簡化版
+markdown_texts = markdown_dict.get('markdown_texts', '')
+'bbox': [],  # 座標全部為空！
+```
+
+**問題**:
+- 只使用了 ~20% 的 PP-StructureV3 功能
+- 未使用 `parsing_res_list`（核心數據結構）
+- 未使用 `layout_bbox`（精確座標）
+- 未使用 `reading_order`（閱讀順序）
+- 未使用 23 種版面元素分類
+
+#### 2. **GPU 配置未優化**
+```python
+# 目前配置 (ocr_service.py:211-219)
+self.structure_engine = PPStructureV3(
+    use_doc_orientation_classify=False,  # ❌ 未啟用前處理
+    use_doc_unwarping=False,             # ❌ 未啟用矯正
+    use_textline_orientation=False,      # ❌ 未啟用方向校正
+    # ... 使用預設配置
+)
+```
+
+**問題**:
+- RTX 4060 8GB 足以運行 server 模型，但用了預設配置
+- 關閉了重要的前處理功能
+- 未充分利用 GPU 算力
+
+#### 3. **PDF 生成策略單一**
+```python
+# 目前只有座標定位模式
+# 導致 21.6% 文字損失（過濾重疊）
+filtered_text_regions = self._filter_text_in_regions(text_regions, regions_to_avoid)
+```
+
+**問題**:
+- 只支援座標定位，不支援流式排版
+- 無法零資訊損失
+- 翻譯功能受限
+
+---
+
+## 🎯 重構目標
+
+### 核心目標
+
+1. **完整利用 PP-StructureV3 能力**
+   - 提取 `parsing_res_list`（23 種元素分類 + 閱讀順序）
+   - 提取 `layout_bbox`（精確座標）
+   - 提取 `layout_det_res`（版面檢測詳情）
+   - 提取 `overall_ocr_res`（所有文字的座標）
+
+2. **雙模式 PDF 生成**
+   - 模式 A: 座標定位（精確還原版面）
+   - 模式 B: 流式排版（零資訊損失，支援翻譯）
+
+3. **GPU 配置最佳化**
+   - 針對 RTX 4060 8GB 的最佳配置
+   - Server 模型 + 所有功能模組
+   - 合理的記憶體管理
+
+4. **向後相容**
+   - 保留現有 API
+   - 舊 JSON 檔案仍可用
+   - 漸進式升級
+
+---
+
+## 🏗️ 新架構設計
+
+### 架構層次
+
+```
+┌──────────────────────────────────────────────────────┐
+│                    API Layer                         │
+│  /tasks, /results, /download (向後相容)              │
+└────────────────┬─────────────────────────────────────┘
+                 │
+┌────────────────▼─────────────────────────────────────┐
+│                Service Layer                         │
+├──────────────────────────────────────────────────────┤
+│  OCRService (現有, 保留)                             │
+│    └─ analyze_layout() [升級] ──┐                   │
+│                                  │                    │
+│  AdvancedLayoutExtractor (新增)  ◄─ 使用相同引擎     │
+│    └─ extract_complete_layout() ─┘                   │
+│                                                       │
+│  PDFGeneratorService (重構)                          │
+│    ├─ generate_coordinate_pdf() [Mode A]            │
+│    └─ generate_flow_pdf()       [Mode B]            │
+└────────────────┬─────────────────────────────────────┘
+                 │
+┌────────────────▼─────────────────────────────────────┐
+│              Engine Layer                            │
+├──────────────────────────────────────────────────────┤
+│  PPStructureV3Engine (新增，統一管理)                │
+│    ├─ GPU 配置 (RTX 4060 8GB 最佳化)                │
+│    ├─ Model 配置 (Server 模型)                      │
+│    └─ 功能開關 (全功能啟用)                         │
+└──────────────────────────────────────────────────────┘
+```
+
+### 核心類別設計
+
+#### 1. PPStructureV3Engine (新增)
+**目的**: 統一管理 PP-StructureV3 引擎，避免重複初始化
+
+```python
+class PPStructureV3Engine:
+    """
+    PP-StructureV3 引擎管理器 (單例)
+    針對 RTX 4060 8GB 優化配置
+    """
+    _instance = None
+
+    def __new__(cls):
+        if cls._instance is None:
+            cls._instance = super().__new__(cls)
+            cls._instance._initialize()
+        return cls._instance
+
+    def _initialize(self):
+        """初始化引擎"""
+        logger.info("Initializing PP-StructureV3 with RTX 4060 8GB optimized config")
+
+        self.engine = PPStructureV3(
+            # ===== GPU 配置 =====
+            use_gpu=True,
+            gpu_mem=6144,  # 保留 2GB 給系統 (8GB - 2GB)
+
+            # ===== 前處理模組 (全部啟用) =====
+            use_doc_orientation_classify=True,   # 文檔方向校正
+            use_doc_unwarping=True,              # 文檔影像矯正
+            use_textline_orientation=True,       # 文字行方向校正
+
+            # ===== 功能模組 (全部啟用) =====
+            use_table_recognition=True,          # 表格識別
+            use_formula_recognition=True,        # 公式識別
+            use_chart_recognition=True,          # 圖表識別
+            use_seal_recognition=True,           # 印章識別
+
+            # ===== OCR 模型配置 (Server 模型) =====
+            text_detection_model_name="ch_PP-OCRv4_server_det",
+            text_recognition_model_name="ch_PP-OCRv4_server_rec",
+
+            # ===== 版面檢測參數 =====
+            layout_threshold=0.5,                # 版面檢測閾值
+            layout_nms=0.5,                      # NMS 閾值
+            layout_unclip_ratio=1.5,            # 邊界框擴展比例
+
+            # ===== OCR 參數 =====
+            text_det_limit_side_len=1920,       # 高解析度檢測
+            text_det_thresh=0.3,                # 檢測閾值
+            text_det_box_thresh=0.5,            # 邊界框閾值
+
+            # ===== 其他 =====
+            show_log=True,
+            use_angle_cls=False,  # 已被 textline_orientation 取代
+        )
+
+        logger.info("PP-StructureV3 engine initialized successfully")
+        logger.info(f"  - GPU: Enabled (RTX 4060 8GB)")
+        logger.info(f"  - Models: Server (High Accuracy)")
+        logger.info(f"  - Features: All Enabled (Table/Formula/Chart/Seal)")
+
+    def predict(self, image_path: str):
+        """執行預測"""
+        return self.engine.predict(image_path)
+
+    def get_engine(self):
+        """獲取引擎實例"""
+        return self.engine
+```
+
+#### 2. AdvancedLayoutExtractor (新增)
+**目的**: 完整提取 PP-StructureV3 的所有版面資訊
+
+```python
+class AdvancedLayoutExtractor:
+    """
+    進階版面提取器
+    完整利用 PP-StructureV3 的 parsing_res_list, layout_bbox, layout_det_res
+    """
+
+    def __init__(self):
+        self.engine = PPStructureV3Engine()
+
+    def extract_complete_layout(
+        self,
+        image_path: Path,
+        output_dir: Optional[Path] = None,
+        current_page: int = 0
+    ) -> Tuple[Optional[Dict], List[Dict]]:
+        """
+        提取完整版面資訊（使用 page_result.json）
+
+        Returns:
+            (layout_data, images_metadata)
+
+        layout_data = {
+            "elements": [
+                {
+                    "element_id": int,
+                    "type": str,  # 23 種類型之一
+                    "bbox": [[x1,y1], [x2,y1], [x2,y2], [x1,y2]],  # ✅ 不再是空列表
+                    "content": str,
+                    "reading_order": int,  # ✅ 閱讀順序
+                    "layout_type": str,    # ✅ single/double/multi-column
+                    "confidence": float,   # ✅ 置信度
+                    "page": int
+                },
+                ...
+            ],
+            "reading_order": [0, 1, 2, ...],
+            "layout_types": ["single", "double"],
+            "total_elements": int
+        }
+        """
+        try:
+            results = self.engine.predict(str(image_path))
+
+            layout_elements = []
+            images_metadata = []
+
+            for page_idx, page_result in enumerate(results):
+                # ✅ 核心改動：使用 page_result.json 而非 page_result.markdown
+                json_data = page_result.json
+
+                # ===== 方法 1: 使用 parsing_res_list (主要來源) =====
+                parsing_res_list = json_data.get('parsing_res_list', [])
+
+                if parsing_res_list:
+                    logger.info(f"Found {len(parsing_res_list)} elements in parsing_res_list")
+
+                    for idx, item in enumerate(parsing_res_list):
+                        element = self._create_element_from_parsing_res(
+                            item, idx, current_page
+                        )
+                        if element:
+                            layout_elements.append(element)
+
+                # ===== 方法 2: 使用 layout_det_res (補充資訊) =====
+                layout_det_res = json_data.get('layout_det_res', {})
+                layout_boxes = layout_det_res.get('boxes', [])
+
+                # 用於豐富 element 資訊（如果 parsing_res_list 缺少某些欄位）
+                self._enrich_elements_with_layout_det(layout_elements, layout_boxes)
+
+                # ===== 方法 3: 處理圖片 (從 markdown_images) =====
+                markdown_dict = page_result.markdown
+                markdown_images = markdown_dict.get('markdown_images', {})
+
+                for img_idx, (img_path, img_obj) in enumerate(markdown_images.items()):
+                    # 保存圖片到磁碟
+                    self._save_image(img_obj, img_path, output_dir or image_path.parent)
+
+                    # 從 parsing_res_list 或 layout_det_res 查找 bbox
+                    bbox = self._find_image_bbox(
+                        img_path, parsing_res_list, layout_boxes
+                    )
+
+                    images_metadata.append({
+                        'element_id': len(layout_elements) + img_idx,
+                        'image_path': img_path,
+                        'type': 'image',
+                        'page': current_page,
+                        'bbox': bbox,
+                    })
+
+            if layout_elements:
+                layout_data = {
+                    'elements': layout_elements,
+                    'total_elements': len(layout_elements),
+                    'reading_order': [e['reading_order'] for e in layout_elements],
+                    'layout_types': list(set(e.get('layout_type') for e in layout_elements)),
+                }
+                logger.info(f"✅ Extracted {len(layout_elements)} elements with complete info")
+                return layout_data, images_metadata
+            else:
+                logger.warning("No layout elements found")
+                return None, []
+
+        except Exception as e:
+            logger.error(f"Advanced layout extraction failed: {e}")
+            import traceback
+            traceback.print_exc()
+            return None, []
+
+    def _create_element_from_parsing_res(
+        self, item: Dict, idx: int, current_page: int
+    ) -> Optional[Dict]:
+        """從 parsing_res_list 的一個 item 創建 element"""
+        # 提取 layout_bbox
+        layout_bbox = item.get('layout_bbox')
+        bbox = self._convert_bbox_to_4point(layout_bbox)
+
+        # 提取版面類型
+        layout_type = item.get('layout', 'single')
+
+        # 創建基礎 element
+        element = {
+            'element_id': idx,
+            'page': current_page,
+            'bbox': bbox,  # ✅ 完整座標
+            'layout_type': layout_type,
+            'reading_order': idx,
+            'confidence': item.get('score', 0.0),
+        }
+
+        # 根據內容類型填充 type 和 content
+        # 順序很重要！優先級: table > formula > image > title > text
+
+        if 'table' in item and item['table']:
+            element['type'] = 'table'
+            element['content'] = item['table']
+            # 提取表格純文字（用於翻譯）
+            element['extracted_text'] = self._extract_table_text(item['table'])
+
+        elif 'formula' in item and item['formula']:
+            element['type'] = 'formula'
+            element['content'] = item['formula']  # LaTeX
+
+        elif 'figure' in item or 'image' in item:
+            element['type'] = 'image'
+            element['content'] = item.get('figure') or item.get('image')
+
+        elif 'title' in item and item['title']:
+            element['type'] = 'title'
+            element['content'] = item['title']
+
+        elif 'text' in item and item['text']:
+            element['type'] = 'text'
+            element['content'] = item['text']
+
+        else:
+            # 未知類型，嘗試提取任何非系統欄位
+            for key, value in item.items():
+                if key not in ['layout_bbox', 'layout', 'score'] and value:
+                    element['type'] = key
+                    element['content'] = value
+                    break
+            else:
+                return None  # 沒有內容，跳過
+
+        return element
+
+    def _convert_bbox_to_4point(self, layout_bbox) -> List:
+        """轉換 layout_bbox 為 4-point 格式"""
+        if layout_bbox is None:
+            return []
+
+        # 處理 numpy array
+        if hasattr(layout_bbox, 'tolist'):
+            bbox = layout_bbox.tolist()
+        else:
+            bbox = list(layout_bbox)
+
+        if len(bbox) == 4:  # [x1, y1, x2, y2]
+            x1, y1, x2, y2 = bbox
+            return [[x1, y1], [x2, y1], [x2, y2], [x1, y2]]
+
+        return []
+
+    def _extract_table_text(self, html_content: str) -> str:
+        """從 HTML 表格提取純文字（用於翻譯）"""
+        try:
+            from bs4 import BeautifulSoup
+            soup = BeautifulSoup(html_content, 'html.parser')
+
+            # 提取所有 cell 的文字
+            cells = []
+            for cell in soup.find_all(['td', 'th']):
+                text = cell.get_text(strip=True)
+                if text:
+                    cells.append(text)
+
+            return ' | '.join(cells)
+        except Exception as e:
+            logger.warning(f"Failed to extract table text: {e}")
+            # Fallback: 簡單去除 HTML 標籤
+            import re
+            text = re.sub(r'<[^>]+>', ' ', html_content)
+            text = re.sub(r'\s+', ' ', text)
+            return text.strip()
+```
+
+#### 3. PDFGeneratorService (重構)
+**目的**: 支援雙模式 PDF 生成
+
+```python
+class PDFGeneratorService:
+    """
+    PDF 生成服務 (重構版)
+    支援兩種模式:
+    - coordinate: 座標定位模式 (精確還原版面)
+    - flow: 流式排版模式 (零資訊損失, 支援翻譯)
+    """
+
+    def generate_pdf(
+        self,
+        json_path: Path,
+        output_path: Path,
+        mode: str = 'coordinate',  # 'coordinate' 或 'flow'
+        source_file_path: Optional[Path] = None
+    ) -> bool:
+        """
+        生成 PDF
+
+        Args:
+            json_path: OCR JSON 檔案路徑
+            output_path: 輸出 PDF 路徑
+            mode: 生成模式 ('coordinate' 或 'flow')
+            source_file_path: 原始檔案路徑（用於獲取尺寸）
+
+        Returns:
+            成功返回 True
+        """
+        try:
+            # 載入 OCR 數據
+            ocr_data = self.load_ocr_json(json_path)
+            if not ocr_data:
+                return False
+
+            # 根據模式選擇生成策略
+            if mode == 'flow':
+                return self._generate_flow_pdf(ocr_data, output_path)
+            else:
+                return self._generate_coordinate_pdf(ocr_data, output_path, source_file_path)
+
+        except Exception as e:
+            logger.error(f"PDF generation failed: {e}")
+            import traceback
+            traceback.print_exc()
+            return False
+
+    def _generate_coordinate_pdf(
+        self,
+        ocr_data: Dict,
+        output_path: Path,
+        source_file_path: Optional[Path]
+    ) -> bool:
+        """
+        模式 A: 座標定位模式
+        - 使用 layout_bbox 精確定位每個元素
+        - 保留原始文件的視覺外觀
+        - 適用於需要精確還原版面的場景
+        """
+        logger.info("Generating PDF in COORDINATE mode (layout-preserving)")
+
+        # 提取數據
+        layout_data = ocr_data.get('layout_data', {})
+        elements = layout_data.get('elements', [])
+
+        if not elements:
+            logger.warning("No layout elements found")
+            return False
+
+        # 按 reading_order 和 page 排序
+        sorted_elements = sorted(elements, key=lambda x: (
+            x.get('page', 0),
+            x.get('reading_order', 0)
+        ))
+
+        # 計算頁面尺寸
+        ocr_width, ocr_height = self.calculate_page_dimensions(ocr_data, source_file_path)
+        target_width, target_height = self._get_target_dimensions(source_file_path, ocr_width, ocr_height)
+
+        scale_w = target_width / ocr_width
+        scale_h = target_height / ocr_height
+
+        # 創建 PDF canvas
+        pdf_canvas = canvas.Canvas(str(output_path), pagesize=(target_width, target_height))
+
+        # 按頁碼分組元素
+        pages = {}
+        for elem in sorted_elements:
+            page = elem.get('page', 0)
+            if page not in pages:
+                pages[page] = []
+            pages[page].append(elem)
+
+        # 渲染每一頁
+        for page_num, page_elements in sorted(pages.items()):
+            if page_num > 0:
+                pdf_canvas.showPage()
+
+            logger.info(f"Rendering page {page_num + 1} with {len(page_elements)} elements")
+
+            # 按 reading_order 渲染每個元素
+            for elem in page_elements:
+                bbox = elem.get('bbox', [])
+                elem_type = elem.get('type')
+                content = elem.get('content', '')
+
+                if not bbox:
+                    logger.warning(f"Element {elem['element_id']} has no bbox, skipping")
+                    continue
+
+                # 根據類型渲染
+                try:
+                    if elem_type == 'table':
+                        self._draw_table_at_bbox(pdf_canvas, content, bbox, target_height, scale_w, scale_h)
+                    elif elem_type == 'text':
+                        self._draw_text_at_bbox(pdf_canvas, content, bbox, target_height, scale_w, scale_h)
+                    elif elem_type == 'title':
+                        self._draw_title_at_bbox(pdf_canvas, content, bbox, target_height, scale_w, scale_h)
+                    elif elem_type == 'image':
+                        img_path = json_path.parent / content
+                        if img_path.exists():
+                            self._draw_image_at_bbox(pdf_canvas, str(img_path), bbox, target_height, scale_w, scale_h)
+                    elif elem_type == 'formula':
+                        self._draw_formula_at_bbox(pdf_canvas, content, bbox, target_height, scale_w, scale_h)
+                    # ... 其他類型
+
+                except Exception as e:
+                    logger.warning(f"Failed to draw {elem_type} element: {e}")
+
+        pdf_canvas.save()
+        logger.info(f"✅ Coordinate PDF generated: {output_path}")
+        return True
+
+    def _generate_flow_pdf(
+        self,
+        ocr_data: Dict,
+        output_path: Path
+    ) -> bool:
+        """
+        模式 B: 流式排版模式
+        - 按 reading_order 流式排版
+        - 零資訊損失（不過濾任何內容）
+        - 使用 ReportLab Platypus 高階 API
+        - 適用於需要翻譯或內容處理的場景
+        """
+        from reportlab.platypus import (
+            SimpleDocTemplate, Paragraph, Spacer,
+            Table, TableStyle, Image as RLImage, PageBreak
+        )
+        from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
+        from reportlab.lib import colors
+        from reportlab.lib.enums import TA_LEFT, TA_CENTER
+
+        logger.info("Generating PDF in FLOW mode (content-preserving)")
+
+        # 提取數據
+        layout_data = ocr_data.get('layout_data', {})
+        elements = layout_data.get('elements', [])
+
+        if not elements:
+            logger.warning("No layout elements found")
+            return False
+
+        # 按 reading_order 排序
+        sorted_elements = sorted(elements, key=lambda x: (
+            x.get('page', 0),
+            x.get('reading_order', 0)
+        ))
+
+        # 創建文檔
+        doc = SimpleDocTemplate(str(output_path))
+        story = []
+        styles = getSampleStyleSheet()
+
+        # 自定義樣式
+        styles.add(ParagraphStyle(
+            name='CustomTitle',
+            parent=styles['Heading1'],
+            fontSize=18,
+            alignment=TA_CENTER,
+            spaceAfter=12
+        ))
+
+        current_page = -1
+
+        # 按順序添加元素
+        for elem in sorted_elements:
+            elem_type = elem.get('type')
+            content = elem.get('content', '')
+            page = elem.get('page', 0)
+
+            # 分頁
+            if page != current_page and current_page != -1:
+                story.append(PageBreak())
+            current_page = page
+
+            try:
+                if elem_type == 'title':
+                    story.append(Paragraph(content, styles['CustomTitle']))
+                    story.append(Spacer(1, 12))
+
+                elif elem_type == 'text':
+                    story.append(Paragraph(content, styles['Normal']))
+                    story.append(Spacer(1, 8))
+
+                elif elem_type == 'table':
+                    # 解析 HTML 表格為 ReportLab Table
+                    table_obj = self._html_to_reportlab_table(content)
+                    if table_obj:
+                        story.append(table_obj)
+                        story.append(Spacer(1, 12))
+
+                elif elem_type == 'image':
+                    # 嵌入圖片
+                    img_path = output_path.parent.parent / content
+                    if img_path.exists():
+                        img = RLImage(str(img_path), width=400, height=300, kind='proportional')
+                        story.append(img)
+                        story.append(Spacer(1, 12))
+
+                elif elem_type == 'formula':
+                    # 公式顯示為等寬字體
+                    story.append(Paragraph(f"<font name='Courier'>{content}</font>", styles['Code']))
+                    story.append(Spacer(1, 8))
+
+            except Exception as e:
+                logger.warning(f"Failed to add {elem_type} element to flow: {e}")
+
+        # 生成 PDF
+        doc.build(story)
+        logger.info(f"✅ Flow PDF generated: {output_path}")
+        return True
+```
+
+---
+
+## 🔧 實作步驟
+
+### 階段 1: 引擎層重構 (2-3 小時)
+
+1. **創建 PPStructureV3Engine 單例類**
+   - 檔案: `backend/app/engines/ppstructure_engine.py` (新增)
+   - 統一管理 PP-StructureV3 引擎
+   - RTX 4060 8GB 最佳化配置
+
+2. **創建 AdvancedLayoutExtractor 類**
+   - 檔案: `backend/app/services/advanced_layout_extractor.py` (新增)
+   - 實作 `extract_complete_layout()`
+   - 完整提取 parsing_res_list, layout_bbox, layout_det_res
+
+3. **更新 OCRService**
+   - 修改 `analyze_layout()` 使用 `AdvancedLayoutExtractor`
+   - 保持向後相容（回退到舊邏輯）
+
+### 階段 2: PDF 生成器重構 (3-4 小時)
+
+1. **重構 PDFGeneratorService**
+   - 添加 `mode` 參數
+   - 實作 `_generate_coordinate_pdf()`
+   - 實作 `_generate_flow_pdf()`
+
+2. **添加輔助方法**
+   - `_draw_table_at_bbox()`: 在指定座標繪製表格
+   - `_draw_text_at_bbox()`: 在指定座標繪製文字
+   - `_draw_title_at_bbox()`: 在指定座標繪製標題
+   - `_draw_formula_at_bbox()`: 在指定座標繪製公式
+   - `_html_to_reportlab_table()`: HTML 轉 ReportLab Table
+
+3. **更新 API 端點**
+   - `/tasks/{id}/download/pdf?mode=coordinate` (預設)
+   - `/tasks/{id}/download/pdf?mode=flow`
+
+### 階段 3: 測試與優化 (2-3 小時)
+
+1. **單元測試**
+   - 測試 AdvancedLayoutExtractor
+   - 測試兩種 PDF 模式
+   - 測試向後相容性
+
+2. **效能測試**
+   - GPU 記憶體使用監控
+   - 處理速度測試
+   - 並發請求測試
+
+3. **品質驗證**
+   - 座標準確度
+   - 閱讀順序正確性
+   - 表格識別準確度
+
+---
+
+## 📈 預期效果
+
+### 功能改善
+
+| 指標 | 目前 | 重構後 | 提升 |
+|------|-----|--------|------|
+| bbox 可用性 | 0% (全空) | 100% | ✅ ∞ |
+| 版面元素分類 | 2 種 | 23 種 | ✅ 11.5x |
+| 閱讀順序 | 無 | 完整保留 | ✅ 100% |
+| 資訊損失 | 21.6% | 0% (流式模式) | ✅ 100% |
+| PDF 模式 | 1 種 | 2 種 | ✅ 2x |
+| 翻譯支援 | 困難 | 完美 | ✅ 100% |
+
+### GPU 使用優化
+
+```python
+# RTX 4060 8GB 配置效果
+配置項目          | 目前   | 重構後
+----------------|--------|--------
+GPU 利用率       | ~30%   | ~70%
+處理速度         | 0.5頁/秒 | 1.2頁/秒
+前處理功能       | 關閉   | 全開
+識別準確度       | ~85%   | ~95%
+```
+
+---
+
+## 🎯 遷移策略
+
+### 向後相容性保證
+
+1. **API 層面**
+   - 保留現有所有 API 端點
+   - 添加可選的 `mode` 參數
+   - 預設行為不變
+
+2. **數據層面**
+   - 舊 JSON 檔案仍可使用
+   - 新增欄位不影響舊邏輯
+   - 漸進式更新
+
+3. **部署策略**
+   - 先部署新引擎和服務
+   - 逐步啟用新功能
+   - 監控效能和錯誤率
+
+---
+
+## 📝 配置檔案
+
+### requirements.txt 更新
+
+```txt
+# 現有依賴
+paddlepaddle-gpu>=3.0.0
+paddleocr>=3.0.0
+
+# 新增依賴
+python-docx>=0.8.11  # Word 文檔生成 (可選)
+PyMuPDF>=1.23.0      # PDF 處理增強
+beautifulsoup4>=4.12.0  # HTML 解析
+lxml>=4.9.0          # XML/HTML 解析加速
+```
+
+### 環境變數配置
+
+```bash
+# .env.local 新增
+PADDLE_GPU_MEMORY=6144  # RTX 4060 8GB 保留 2GB 給系統
+PADDLE_USE_SERVER_MODEL=true
+PADDLE_ENABLE_ALL_FEATURES=true
+
+# PDF 生成預設模式
+PDF_DEFAULT_MODE=coordinate  # 或 flow
+```
+
+---
+
+## 🚀 實作優先級
+
+### P0 (立即實作)
+1. ✅ PPStructureV3Engine 統一引擎
+2. ✅ AdvancedLayoutExtractor 完整提取
+3. ✅ 座標定位模式 PDF
+
+### P1 (第二階段)
+4. ⭐ 流式排版模式 PDF
+5. ⭐ API 端點更新 (mode 參數)
+
+### P2 (優化階段)
+6. 效能監控和優化
+7. 批次處理支援
+8. 品質檢查工具
+
+---
+
+## ⚠️ 風險與緩解
+
+### 風險 1: GPU 記憶體不足
+**緩解**:
+- 合理設定 `gpu_mem=6144` (保留 2GB)
+- 添加記憶體監控
+- 大文檔分批處理
+
+### 風險 2: 處理速度下降
+**緩解**:
+- Server 模型在 GPU 上比 Mobile 更快
+- 並行處理多頁
+- 結果快取
+
+### 風險 3: 向後相容問題
+**緩解**:
+- 保留舊邏輯作為回退
+- 逐步遷移
+- 完整測試覆蓋
+
+---
+
+**預計總開發時間**: 7-10 小時
+**預計效果**: 100% 利用 PP-StructureV3 能力 + 零資訊損失 + 完美翻譯支援
+
+您希望我開始實作哪個階段？
--- a/openspec/changes/archive/2025-11-18-fix-result-preview-and-pdf-download/PP-STRUCTURE-ENHANCEMENT-PLAN.md
+++ b/openspec/changes/archive/2025-11-18-fix-result-preview-and-pdf-download/PP-STRUCTURE-ENHANCEMENT-PLAN.md
@@ -0,0 +1,691 @@
+# PP-StructureV3 完整版面資訊利用計劃
+
+## 📋 執行摘要
+
+### 問題診斷
+目前實作**嚴重低估了 PP-StructureV3 的能力**，只使用了 `page_result.markdown` 屬性，完全忽略了核心的版面資訊 `page_result.json`。
+
+### 核心發現
+1. **PP-StructureV3 提供完整的版面解析資訊**，包括：
+   - `parsing_res_list`: 按閱讀順序排列的版面元素列表
+   - `layout_bbox`: 每個元素的精確座標
+   - `layout_det_res`: 版面檢測結果（區域類型、置信度）
+   - `overall_ocr_res`: 完整的 OCR 結果（包含所有文字的 bbox）
+   - `layout`: 版面類型（單欄/雙欄/多欄）
+
+2. **目前實作的缺陷**：
+   ```python
+   # ❌ 目前做法 (ocr_service.py:615-646)
+   markdown_dict = page_result.markdown  # 只獲取 markdown 和圖片
+   markdown_texts = markdown_dict.get('markdown_texts', '')
+   # bbox 被設為空列表
+   'bbox': [],  # PP-StructureV3 doesn't provide individual bbox in this format
+   ```
+
+3. **應該這樣做**：
+   ```python
+   # ✅ 正確做法
+   json_data = page_result.json  # 獲取完整的結構化資訊
+   parsing_list = json_data.get('parsing_res_list', [])  # 閱讀順序 + bbox
+   layout_det = json_data.get('layout_det_res', {})  # 版面檢測
+   overall_ocr = json_data.get('overall_ocr_res', {})  # 所有文字的座標
+   ```
+
+---
+
+## 🎯 規劃目標
+
+### 階段 1: 提取完整版面資訊（高優先級）
+**目標**: 修改 `analyze_layout()` 以使用 PP-StructureV3 的完整能力
+
+**預期效果**:
+- ✅ 每個版面元素都有精確的 `layout_bbox`
+- ✅ 保留原始閱讀順序（`parsing_res_list` 的順序）
+- ✅ 獲取版面類型資訊（單欄/雙欄）
+- ✅ 提取區域分類（text/table/figure/title/formula）
+- ✅ 零資訊損失（不需要過濾重疊文字）
+
+### 階段 2: 實作雙模式 PDF 生成（中優先級）
+**目標**: 提供兩種 PDF 生成模式
+
+**模式 A: 精確座標定位模式**
+- 使用 `layout_bbox` 精確定位每個元素
+- 保留原始文件的視覺外觀
+- 適用於需要精確還原版面的場景
+
+**模式 B: 流式排版模式**
+- 按 `parsing_res_list` 順序流式排版
+- 使用 ReportLab Platypus 高階 API
+- 零資訊損失，所有內容都可搜尋
+- 適用於需要翻譯或內容處理的場景
+
+### 階段 3: 多欄版面處理（低優先級）
+**目標**: 利用 PP-StructureV3 的多欄識別能力
+
+---
+
+## 📊 PP-StructureV3 完整資料結構
+
+### 1. `page_result.json` 完整結構
+
+```python
+{
+    # 基本資訊
+    "input_path": str,  # 源文件路徑
+    "page_index": int,  # 頁碼（PDF 專用）
+
+    # 版面檢測結果
+    "layout_det_res": {
+        "boxes": [
+            {
+                "cls_id": int,        # 類別 ID
+                "label": str,         # 區域類型: text/table/figure/title/formula/seal
+                "score": float,       # 置信度 0-1
+                "coordinate": [x1, y1, x2, y2]  # 矩形座標
+            },
+            ...
+        ]
+    },
+
+    # 完整 OCR 結果
+    "overall_ocr_res": {
+        "dt_polys": np.ndarray,      # 文字檢測多邊形
+        "rec_polys": np.ndarray,     # 文字識別多邊形
+        "rec_boxes": np.ndarray,     # 文字識別矩形框 (n, 4, 2) int16
+        "rec_texts": List[str],      # 識別的文字
+        "rec_scores": np.ndarray     # 識別置信度
+    },
+
+    # **核心版面解析結果（按閱讀順序）**
+    "parsing_res_list": [
+        {
+            "layout_bbox": np.ndarray,  # 區域邊界框 [x1, y1, x2, y2]
+            "layout": str,              # 版面類型: single/double/multi-column
+            "text": str,                # 文字內容（如果是文字區域）
+            "table": str,               # 表格 HTML（如果是表格區域）
+            "image": str,               # 圖片路徑（如果是圖片區域）
+            "formula": str,             # 公式 LaTeX（如果是公式區域）
+            # ... 其他區域類型
+        },
+        ...  # 順序 = 閱讀順序
+    ],
+
+    # 文字段落 OCR（按閱讀順序）
+    "text_paragraphs_ocr_res": {
+        "rec_polys": np.ndarray,
+        "rec_texts": List[str],
+        "rec_scores": np.ndarray
+    },
+
+    # 可選模組結果
+    "formula_res_region1": {...},  # 公式識別結果
+    "table_cell_img": {...},       # 表格儲存格圖片
+    "seal_res_region1": {...}      # 印章識別結果
+}
+```
+
+### 2. 關鍵欄位說明
+
+| 欄位 | 用途 | 資料格式 | 重要性 |
+|------|------|---------|--------|
+| `parsing_res_list` | **核心資料**，包含按閱讀順序排列的所有版面元素 | List[Dict] | ⭐⭐⭐⭐⭐ |
+| `layout_bbox` | 每個元素的精確座標 | np.ndarray [x1,y1,x2,y2] | ⭐⭐⭐⭐⭐ |
+| `layout` | 版面類型（單欄/雙欄/多欄） | str: single/double/multi | ⭐⭐⭐⭐ |
+| `layout_det_res` | 版面檢測詳細結果（包含區域分類） | Dict with boxes list | ⭐⭐⭐⭐ |
+| `overall_ocr_res` | 所有文字的 OCR 結果和座標 | Dict with np.ndarray | ⭐⭐⭐⭐ |
+| `markdown` | 簡化的 Markdown 輸出 | Dict with texts/images | ⭐⭐ |
+
+---
+
+## 🔧 實作計劃
+
+### 任務 1: 重構 `analyze_layout()` 函數
+
+**檔案**: `/backend/app/services/ocr_service.py`
+
+**修改範圍**: Lines 590-710
+
+**核心改動**:
+
+```python
+def analyze_layout(self, image_path: Path, output_dir: Optional[Path] = None, current_page: int = 0) -> Tuple[Optional[Dict], List[Dict]]:
+    """
+    Analyze document layout using PP-StructureV3 (使用完整的 JSON 資訊)
+    """
+    try:
+        structure_engine = self.get_structure_engine()
+        results = structure_engine.predict(str(image_path))
+
+        layout_elements = []
+        images_metadata = []
+
+        for page_idx, page_result in enumerate(results):
+            # ✅ 修改 1: 使用完整的 JSON 資料而非只用 markdown
+            json_data = page_result.json
+
+            # ✅ 修改 2: 提取版面檢測結果
+            layout_det_res = json_data.get('layout_det_res', {})
+            layout_boxes = layout_det_res.get('boxes', [])
+
+            # ✅ 修改 3: 提取核心的 parsing_res_list（包含閱讀順序 + bbox）
+            parsing_res_list = json_data.get('parsing_res_list', [])
+
+            if parsing_res_list:
+                # *** 核心邏輯：使用 parsing_res_list ***
+                for idx, item in enumerate(parsing_res_list):
+                    # 提取 bbox（不再是空列表！）
+                    layout_bbox = item.get('layout_bbox')
+                    if layout_bbox is not None:
+                        # 轉換 numpy array 為標準格式
+                        if hasattr(layout_bbox, 'tolist'):
+                            bbox = layout_bbox.tolist()
+                        else:
+                            bbox = list(layout_bbox)
+
+                        # 轉換為 4-point 格式: [[x1,y1], [x2,y1], [x2,y2], [x1,y2]]
+                        if len(bbox) == 4:  # [x1, y1, x2, y2]
+                            x1, y1, x2, y2 = bbox
+                            bbox = [[x1, y1], [x2, y1], [x2, y2], [x1, y2]]
+                    else:
+                        bbox = []
+
+                    # 提取版面類型
+                    layout_type = item.get('layout', 'single')
+
+                    # 創建元素（包含所有資訊）
+                    element = {
+                        'element_id': idx,
+                        'page': current_page,
+                        'bbox': bbox,  # ✅ 不再是空列表！
+                        'layout_type': layout_type,  # ✅ 新增版面類型
+                        'reading_order': idx,  # ✅ 新增閱讀順序
+                    }
+
+                    # 根據內容類型提取資料
+                    if 'table' in item:
+                        element['type'] = 'table'
+                        element['content'] = item['table']
+                        # 提取表格純文字（用於翻譯）
+                        element['extracted_text'] = self._extract_table_text(item['table'])
+
+                    elif 'text' in item:
+                        element['type'] = 'text'
+                        element['content'] = item['text']
+
+                    elif 'figure' in item or 'image' in item:
+                        element['type'] = 'image'
+                        element['content'] = item.get('figure') or item.get('image')
+
+                    elif 'formula' in item:
+                        element['type'] = 'formula'
+                        element['content'] = item['formula']
+
+                    elif 'title' in item:
+                        element['type'] = 'title'
+                        element['content'] = item['title']
+
+                    else:
+                        # 未知類型，記錄所有非系統欄位
+                        for key, value in item.items():
+                            if key not in ['layout_bbox', 'layout']:
+                                element['type'] = key
+                                element['content'] = value
+                                break
+
+                    layout_elements.append(element)
+
+            else:
+                # 回退到 markdown 方式（向後相容）
+                logger.warning("No parsing_res_list found, falling back to markdown parsing")
+                markdown_dict = page_result.markdown
+                # ... 原有的 markdown 解析邏輯 ...
+
+            # ✅ 修改 4: 同時處理提取的圖片（仍需保存到磁碟）
+            markdown_dict = page_result.markdown
+            markdown_images = markdown_dict.get('markdown_images', {})
+
+            for img_idx, (img_path, img_obj) in enumerate(markdown_images.items()):
+                # 保存圖片到磁碟
+                try:
+                    base_dir = output_dir if output_dir else image_path.parent
+                    full_img_path = base_dir / img_path
+                    full_img_path.parent.mkdir(parents=True, exist_ok=True)
+
+                    if hasattr(img_obj, 'save'):
+                        img_obj.save(str(full_img_path))
+                        logger.info(f"Saved extracted image to {full_img_path}")
+                except Exception as e:
+                    logger.warning(f"Failed to save image {img_path}: {e}")
+
+                # 提取 bbox（從檔名或從 parsing_res_list 匹配）
+                bbox = self._find_image_bbox(img_path, parsing_res_list, layout_boxes)
+
+                images_metadata.append({
+                    'element_id': len(layout_elements) + img_idx,
+                    'image_path': img_path,
+                    'type': 'image',
+                    'page': current_page,
+                    'bbox': bbox,
+                })
+
+        if layout_elements:
+            layout_data = {
+                'elements': layout_elements,
+                'total_elements': len(layout_elements),
+                'reading_order': [e['reading_order'] for e in layout_elements],  # ✅ 保留閱讀順序
+                'layout_types': list(set(e.get('layout_type') for e in layout_elements)),  # ✅ 版面類型統計
+            }
+            logger.info(f"Detected {len(layout_elements)} layout elements (with bbox and reading order)")
+            return layout_data, images_metadata
+        else:
+            logger.warning("No layout elements detected")
+            return None, []
+
+    except Exception as e:
+        import traceback
+        logger.error(f"Layout analysis error: {str(e)}\n{traceback.format_exc()}")
+        return None, []
+
+
+def _find_image_bbox(self, img_path: str, parsing_res_list: List[Dict], layout_boxes: List[Dict]) -> List:
+    """
+    從 parsing_res_list 或 layout_det_res 中查找圖片的 bbox
+    """
+    # 方法 1: 從檔名提取（現有方法）
+    import re
+    match = re.search(r'box_(\d+)_(\d+)_(\d+)_(\d+)', img_path)
+    if match:
+        x1, y1, x2, y2 = map(int, match.groups())
+        return [[x1, y1], [x2, y1], [x2, y2], [x1, y2]]
+
+    # 方法 2: 從 parsing_res_list 匹配（如果包含圖片路徑資訊）
+    for item in parsing_res_list:
+        if 'image' in item or 'figure' in item:
+            content = item.get('image') or item.get('figure')
+            if img_path in str(content):
+                bbox = item.get('layout_bbox')
+                if bbox is not None:
+                    if hasattr(bbox, 'tolist'):
+                        bbox_list = bbox.tolist()
+                    else:
+                        bbox_list = list(bbox)
+                    if len(bbox_list) == 4:
+                        x1, y1, x2, y2 = bbox_list
+                        return [[x1, y1], [x2, y1], [x2, y2], [x1, y2]]
+
+    # 方法 3: 從 layout_det_res 匹配（根據類型）
+    for box in layout_boxes:
+        if box.get('label') in ['figure', 'image']:
+            coord = box.get('coordinate', [])
+            if len(coord) == 4:
+                x1, y1, x2, y2 = coord
+                return [[x1, y1], [x2, y1], [x2, y2], [x1, y2]]
+
+    logger.warning(f"Could not find bbox for image {img_path}")
+    return []
+```
+
+---
+
+### 任務 2: 更新 PDF 生成器使用新資訊
+
+**檔案**: `/backend/app/services/pdf_generator_service.py`
+
+**核心改動**:
+
+1. **移除文字過濾邏輯**（不再需要！）
+   - 因為 `parsing_res_list` 已經按閱讀順序排列
+   - 表格/圖片有自己的區域，文字有自己的區域
+   - 不會有重疊問題
+
+2. **按 `reading_order` 渲染元素**
+   ```python
+   def generate_layout_pdf(self, json_path: Path, output_path: Path, mode: str = 'coordinate') -> bool:
+       """
+       mode: 'coordinate' 或 'flow'
+       """
+       # 載入資料
+       ocr_data = self.load_ocr_json(json_path)
+       layout_data = ocr_data.get('layout_data', {})
+       elements = layout_data.get('elements', [])
+
+       if mode == 'coordinate':
+           # 模式 A: 座標定位模式
+           return self._generate_coordinate_pdf(elements, output_path, ocr_data)
+       else:
+           # 模式 B: 流式排版模式
+           return self._generate_flow_pdf(elements, output_path, ocr_data)
+
+   def _generate_coordinate_pdf(self, elements: List[Dict], output_path: Path, ocr_data: Dict) -> bool:
+       """座標定位模式 - 精確還原版面"""
+       # 按 reading_order 排序元素
+       sorted_elements = sorted(elements, key=lambda x: x.get('reading_order', 0))
+
+       # 按頁碼分組
+       pages = {}
+       for elem in sorted_elements:
+           page = elem.get('page', 0)
+           if page not in pages:
+               pages[page] = []
+           pages[page].append(elem)
+
+       # 渲染每頁
+       for page_num, page_elements in sorted(pages.items()):
+           for elem in page_elements:
+               bbox = elem.get('bbox', [])
+               elem_type = elem.get('type')
+               content = elem.get('content', '')
+
+               if not bbox:
+                   logger.warning(f"Element {elem['element_id']} has no bbox, skipping")
+                   continue
+
+               # 使用精確座標渲染
+               if elem_type == 'table':
+                   self.draw_table_at_bbox(pdf_canvas, content, bbox, page_height, scale_w, scale_h)
+               elif elem_type == 'text':
+                   self.draw_text_at_bbox(pdf_canvas, content, bbox, page_height, scale_w, scale_h)
+               elif elem_type == 'image':
+                   self.draw_image_at_bbox(pdf_canvas, content, bbox, page_height, scale_w, scale_h)
+               # ... 其他類型
+
+   def _generate_flow_pdf(self, elements: List[Dict], output_path: Path, ocr_data: Dict) -> bool:
+       """流式排版模式 - 零資訊損失"""
+       from reportlab.platypus import SimpleDocTemplate, Paragraph, Table, Image, Spacer
+       from reportlab.lib.styles import getSampleStyleSheet
+
+       # 按 reading_order 排序元素
+       sorted_elements = sorted(elements, key=lambda x: x.get('reading_order', 0))
+
+       # 創建 Story（流式內容）
+       story = []
+       styles = getSampleStyleSheet()
+
+       for elem in sorted_elements:
+           elem_type = elem.get('type')
+           content = elem.get('content', '')
+
+           if elem_type == 'title':
+               story.append(Paragraph(content, styles['Title']))
+           elif elem_type == 'text':
+               story.append(Paragraph(content, styles['Normal']))
+           elif elem_type == 'table':
+               # 解析 HTML 表格為 ReportLab Table
+               table_obj = self._html_to_reportlab_table(content)
+               story.append(table_obj)
+           elif elem_type == 'image':
+               # 嵌入圖片
+               img_path = json_path.parent / content
+               if img_path.exists():
+                   story.append(Image(str(img_path), width=400, height=300))
+
+           story.append(Spacer(1, 12))  # 間距
+
+       # 生成 PDF
+       doc = SimpleDocTemplate(str(output_path))
+       doc.build(story)
+       return True
+   ```
+
+---
+
+## 📈 預期效果對比
+
+### 目前實作 vs 新實作
+
+| 指標 | 目前實作 ❌ | 新實作 ✅ | 改善 |
+|------|-----------|----------|------|
+| **bbox 資訊** | 空列表 `[]` | 精確座標 `[x1,y1,x2,y2]` | ✅ 100% |
+| **閱讀順序** | 無（混合 HTML） | `reading_order` 欄位 | ✅ 100% |
+| **版面類型** | 無 | `layout_type`（單欄/雙欄） | ✅ 100% |
+| **元素分類** | 簡單判斷 `<table` | 精確分類（9+ 類型） | ✅ 100% |
+| **資訊損失** | 21.6% 文字被過濾 | 0% 損失（流式模式） | ✅ 100% |
+| **座標精度** | 只有部分圖片 bbox | 所有元素都有 bbox | ✅ 100% |
+| **PDF 模式** | 只有座標定位 | 雙模式（座標+流式） | ✅ 新功能 |
+| **翻譯支援** | 困難（資訊損失） | 完美（零損失） | ✅ 100% |
+
+### 具體改善
+
+#### 1. 零資訊損失
+```python
+# ❌ 目前: 342 個文字區域 → 過濾後 268 個 = 損失 74 個 (21.6%)
+filtered_text_regions = self._filter_text_in_regions(text_regions, regions_to_avoid)
+
+# ✅ 新實作: 不需要過濾，直接使用 parsing_res_list
+# 所有元素（文字、表格、圖片）都在各自的區域中，不重疊
+for elem in sorted(elements, key=lambda x: x['reading_order']):
+    render_element(elem)  # 渲染所有元素，零損失
+```
+
+#### 2. 精確 bbox
+```python
+# ❌ 目前: bbox 是空列表
+{
+    'element_id': 0,
+    'type': 'table',
+    'bbox': [],  # ← 無法定位！
+}
+
+# ✅ 新實作: 從 layout_bbox 獲取精確座標
+{
+    'element_id': 0,
+    'type': 'table',
+    'bbox': [[770, 776], [1122, 776], [1122, 1058], [770, 1058]],  # ← 精確定位！
+    'reading_order': 3,
+    'layout_type': 'single'
+}
+```
+
+#### 3. 閱讀順序
+```python
+# ❌ 目前: 無法保證正確的閱讀順序
+# 表格、圖片、文字混在一起，順序混亂
+
+# ✅ 新實作: parsing_res_list 的順序 = 閱讀順序
+elements = sorted(elements, key=lambda x: x['reading_order'])
+# 元素按 reading_order: 0, 1, 2, 3, ... 渲染
+# 完美保留文件的邏輯順序
+```
+
+---
+
+## 🚀 實作步驟
+
+### 第一階段：核心重構（2-3 小時）
+
+1. **修改 `analyze_layout()` 函數**
+   - 從 `page_result.json` 提取 `parsing_res_list`
+   - 提取 `layout_bbox` 為每個元素的 bbox
+   - 保留 `reading_order`
+   - 提取 `layout_type`
+   - 測試輸出 JSON 結構
+
+2. **添加輔助函數**
+   - `_find_image_bbox()`: 從多個來源查找圖片 bbox
+   - `_convert_bbox_format()`: 統一 bbox 格式
+   - `_extract_element_content()`: 根據類型提取內容
+
+3. **測試驗證**
+   - 使用現有測試文件重新執行 OCR
+   - 檢查生成的 JSON 是否包含 bbox
+   - 驗證 reading_order 是否正確
+
+### 第二階段：PDF 生成優化（2-3 小時）
+
+1. **實作座標定位模式**
+   - 移除文字過濾邏輯
+   - 按 bbox 精確渲染每個元素
+   - 按 reading_order 確定渲染順序（同頁元素）
+
+2. **實作流式排版模式**
+   - 使用 ReportLab Platypus
+   - 按 reading_order 構建 Story
+   - 實作各類型元素的流式渲染
+
+3. **添加 API 參數**
+   - `/tasks/{id}/download/pdf?mode=coordinate` (預設)
+   - `/tasks/{id}/download/pdf?mode=flow`
+
+### 第三階段：測試與優化（1-2 小時）
+
+1. **完整測試**
+   - 單頁文件測試
+   - 多頁 PDF 測試
+   - 多欄版面測試
+   - 複雜表格測試
+
+2. **效能優化**
+   - 減少重複計算
+   - 優化 bbox 轉換
+   - 快取處理
+
+3. **文檔更新**
+   - 更新 API 文檔
+   - 添加使用範例
+   - 更新架構圖
+
+---
+
+## 💡 關鍵技術細節
+
+### 1. Numpy Array 處理
+```python
+# layout_bbox 是 numpy.ndarray，需要轉換為標準格式
+layout_bbox = item.get('layout_bbox')
+if hasattr(layout_bbox, 'tolist'):
+    bbox = layout_bbox.tolist()  # [x1, y1, x2, y2]
+else:
+    bbox = list(layout_bbox)
+
+# 轉換為 4-point 格式
+x1, y1, x2, y2 = bbox
+bbox_4point = [[x1, y1], [x2, y1], [x2, y2], [x1, y2]]
+```
+
+### 2. 版面類型處理
+```python
+# 根據 layout_type 調整渲染策略
+layout_type = elem.get('layout_type', 'single')
+
+if layout_type == 'double':
+    # 雙欄版面：可能需要特殊處理
+    pass
+elif layout_type == 'multi':
+    # 多欄版面：更複雜的處理
+    pass
+```
+
+### 3. 閱讀順序保證
+```python
+# 確保按正確順序渲染
+elements = layout_data.get('elements', [])
+sorted_elements = sorted(elements, key=lambda x: (
+    x.get('page', 0),          # 先按頁碼
+    x.get('reading_order', 0)  # 再按閱讀順序
+))
+```
+
+---
+
+## ⚠️ 風險與緩解措施
+
+### 風險 1: 向後相容性
+**問題**: 舊的 JSON 檔案沒有新欄位
+
+**緩解措施**:
+```python
+# 在 analyze_layout() 中添加回退邏輯
+parsing_res_list = json_data.get('parsing_res_list', [])
+if not parsing_res_list:
+    logger.warning("No parsing_res_list, using markdown fallback")
+    # 使用舊的 markdown 解析邏輯
+```
+
+### 風險 2: PaddleOCR 版本差異
+**問題**: 不同版本的 PaddleOCR 可能輸出格式不同
+
+**緩解措施**:
+- 記錄 PaddleOCR 版本到 JSON
+- 添加版本檢測邏輯
+- 提供多版本支援
+
+### 風險 3: 效能影響
+**問題**: 提取更多資訊可能增加處理時間
+
+**緩解措施**:
+- 只在需要時提取詳細資訊
+- 使用快取
+- 並行處理多頁
+
+---
+
+## 📝 TODO Checklist
+
+### 階段 1: 核心重構
+- [ ] 修改 `analyze_layout()` 使用 `page_result.json`
+- [ ] 提取 `parsing_res_list`
+- [ ] 提取 `layout_bbox` 並轉換格式
+- [ ] 保留 `reading_order`
+- [ ] 提取 `layout_type`
+- [ ] 實作 `_find_image_bbox()`
+- [ ] 添加回退邏輯（向後相容）
+- [ ] 測試新 JSON 輸出結構
+
+### 階段 2: PDF 生成優化
+- [ ] 實作 `_generate_coordinate_pdf()`
+- [ ] 實作 `_generate_flow_pdf()`
+- [ ] 移除舊的文字過濾邏輯
+- [ ] 添加 mode 參數到 API
+- [ ] 實作 HTML 表格解析器（用於流式模式）
+- [ ] 測試兩種模式的 PDF 輸出
+
+### 階段 3: 測試與文檔
+- [ ] 單頁文件測試
+- [ ] 多頁 PDF 測試
+- [ ] 複雜版面測試（多欄、表格密集）
+- [ ] 效能測試
+- [ ] 更新 API 文檔
+- [ ] 更新使用說明
+- [ ] 創建遷移指南
+
+---
+
+## 🎓 學習資源
+
+1. **PaddleOCR 官方文檔**
+   - [PP-StructureV3 Usage Tutorial](http://www.paddleocr.ai/main/en/version3.x/pipeline_usage/PP-StructureV3.html)
+   - [PaddleX PP-StructureV3](https://paddlepaddle.github.io/PaddleX/3.0/en/pipeline_usage/tutorials/ocr_pipelines/PP-StructureV3.html)
+
+2. **ReportLab 文檔**
+   - [Platypus User Guide](https://www.reportlab.com/docs/reportlab-userguide.pdf)
+   - [Table Styling](https://www.reportlab.com/docs/reportlab-userguide.pdf#page=80)
+
+3. **參考實作**
+   - PaddleOCR GitHub: `/paddlex/inference/pipelines/layout_parsing/pipeline_v2.py`
+
+---
+
+## 🏁 成功標準
+
+### 必須達成
+✅ 所有版面元素都有精確的 bbox
+✅ 閱讀順序正確保留
+✅ 零資訊損失（流式模式）
+✅ 向後相容（舊 JSON 仍可用）
+
+### 期望達成
+✅ 雙模式 PDF 生成（座標 + 流式）
+✅ 多欄版面正確處理
+✅ 翻譯功能支援（表格文字可提取）
+✅ 效能無明顯下降
+
+### 附加目標
+✅ 支援更多元素類型（公式、印章）
+✅ 版面類型統計和分析
+✅ 視覺化版面結構
+
+---
+
+**規劃完成時間**: 2025-01-18
+**預計開發時間**: 5-8 小時
+**優先級**: P0 (最高優先級)
--- a/openspec/changes/archive/2025-11-18-fix-result-preview-and-pdf-download/paddleocr_layout_recovery_research.md
+++ b/openspec/changes/archive/2025-11-18-fix-result-preview-and-pdf-download/paddleocr_layout_recovery_research.md
--- a/openspec/changes/archive/2025-11-18-fix-result-preview-and-pdf-download/proposal.md
+++ b/openspec/changes/archive/2025-11-18-fix-result-preview-and-pdf-download/proposal.md
--- a/openspec/changes/archive/2025-11-18-fix-result-preview-and-pdf-download/specs/result-export/spec.md
+++ b/openspec/changes/archive/2025-11-18-fix-result-preview-and-pdf-download/specs/result-export/spec.md
--- a/openspec/changes/archive/2025-11-18-fix-result-preview-and-pdf-download/specs/task-management/spec.md
+++ b/openspec/changes/archive/2025-11-18-fix-result-preview-and-pdf-download/specs/task-management/spec.md
--- a/openspec/changes/archive/2025-11-18-fix-result-preview-and-pdf-download/tasks.md
+++ b/openspec/changes/archive/2025-11-18-fix-result-preview-and-pdf-download/tasks.md
--- a/openspec/changes/archive/2025-11-18-migrate-to-external-api-authentication/FRONTEND_IMPLEMENTATION.md
+++ b/openspec/changes/archive/2025-11-18-migrate-to-external-api-authentication/FRONTEND_IMPLEMENTATION.md
--- a/openspec/changes/archive/2025-11-18-migrate-to-external-api-authentication/IMPLEMENTATION_COMPLETE.md
+++ b/openspec/changes/archive/2025-11-18-migrate-to-external-api-authentication/IMPLEMENTATION_COMPLETE.md
--- a/openspec/changes/archive/2025-11-18-migrate-to-external-api-authentication/PROGRESS_UPDATE.md
+++ b/openspec/changes/archive/2025-11-18-migrate-to-external-api-authentication/PROGRESS_UPDATE.md
--- a/openspec/changes/archive/2025-11-18-migrate-to-external-api-authentication/database_schema.sql
+++ b/openspec/changes/archive/2025-11-18-migrate-to-external-api-authentication/database_schema.sql
--- a/openspec/changes/archive/2025-11-18-migrate-to-external-api-authentication/proposal.md
+++ b/openspec/changes/archive/2025-11-18-migrate-to-external-api-authentication/proposal.md
--- a/openspec/changes/archive/2025-11-18-migrate-to-external-api-authentication/tasks.md
+++ b/openspec/changes/archive/2025-11-18-migrate-to-external-api-authentication/tasks.md
--- a/openspec/changes/dual-track-document-processing/design.md
+++ b/openspec/changes/dual-track-document-processing/design.md
@@ -0,0 +1,276 @@
+# Technical Design: Dual-track Document Processing
+
+## Context
+
+### Background
+The current OCR tool processes all documents through PaddleOCR, even when dealing with editable PDFs that contain extractable text. This causes:
+- Unnecessary processing overhead
+- Potential quality degradation from re-OCRing already digital text
+- Loss of precise formatting information
+- Inefficient GPU usage on documents that don't need OCR
+
+### Constraints
+- RTX 4060 8GB GPU memory limitation
+- Need to maintain backward compatibility with existing API
+- Must support future translation features
+- Should handle mixed documents (partially scanned, partially digital)
+
+### Stakeholders
+- API consumers expecting consistent JSON/PDF output
+- Translation system requiring structure preservation
+- Performance-sensitive deployments
+
+## Goals / Non-Goals
+
+### Goals
+- Intelligently route documents to appropriate processing track
+- Preserve document structure for translation
+- Optimize GPU usage by avoiding unnecessary OCR
+- Maintain unified output format across tracks
+- Reduce processing time for editable PDFs by 70%+
+
+### Non-Goals
+- Implementing the actual translation engine (future phase)
+- Supporting video or audio transcription
+- Real-time collaborative editing
+- OCR model training or fine-tuning
+
+## Decisions
+
+### Decision 1: Dual-track Architecture
+**What**: Implement two separate processing pipelines - OCR track and Direct extraction track
+
+**Why**:
+- Editable PDFs don't need OCR, can be processed 10-100x faster
+- Direct extraction preserves exact formatting and fonts
+- OCR track remains optimal for scanned documents
+
+**Alternatives considered**:
+1. **Single enhanced OCR pipeline**: Would still waste resources on editable PDFs
+2. **Hybrid approach per page**: Too complex, most documents are uniformly editable or scanned
+3. **Multiple specialized pipelines**: Over-engineering for current requirements
+
+### Decision 2: UnifiedDocument Model
+**What**: Create a standardized intermediate representation for both tracks
+
+**Why**:
+- Provides consistent API interface regardless of processing track
+- Simplifies downstream processing (PDF generation, translation)
+- Enables track switching without breaking changes
+
+**Structure**:
+```python
+@dataclass
+class UnifiedDocument:
+    document_id: str
+    metadata: DocumentMetadata
+    pages: List[Page]
+    processing_track: Literal["ocr", "direct"]
+
+@dataclass
+class Page:
+    page_number: int
+    elements: List[DocumentElement]
+    dimensions: Dimensions
+
+@dataclass
+class DocumentElement:
+    element_id: str
+    type: ElementType  # text, table, image, header, etc.
+    content: Union[str, Dict, bytes]
+    bbox: BoundingBox
+    style: Optional[StyleInfo]
+    confidence: Optional[float]  # Only for OCR track
+```
+
+### Decision 3: PyMuPDF for Direct Extraction
+**What**: Use PyMuPDF (fitz) library for editable PDF processing
+
+**Why**:
+- Mature, well-maintained library
+- Excellent coordinate preservation
+- Fast C++ backend
+- Supports text, tables, and image extraction with positions
+
+**Alternatives considered**:
+1. **pdfplumber**: Good but slower, less precise coordinates
+2. **PyPDF2**: Limited layout information
+3. **PDFMiner**: Complex API, slower performance
+
+### Decision 4: Processing Track Auto-detection
+**What**: Automatically determine optimal track based on document analysis
+
+**Detection logic**:
+```python
+def detect_track(file_path: Path) -> str:
+    file_type = magic.from_file(file_path, mime=True)
+
+    if file_type.startswith('image/'):
+        return "ocr"
+
+    if file_type == 'application/pdf':
+        # Check if PDF has extractable text
+        doc = fitz.open(file_path)
+        for page in doc[:3]:  # Sample first 3 pages
+            text = page.get_text()
+            if len(text.strip()) < 100:  # Minimal text
+                return "ocr"
+        return "direct"
+
+    if file_type in OFFICE_MIMES:
+        return "ocr"  # For now, may add direct Office support later
+
+    return "ocr"  # Default fallback
+```
+
+### Decision 5: GPU Memory Management
+**What**: Implement dynamic batch sizing and model caching for RTX 4060 8GB
+
+**Why**:
+- Prevents OOM errors
+- Maximizes throughput
+- Enables concurrent request handling
+
+**Strategy**:
+```python
+# Adaptive batch sizing based on available memory
+batch_size = calculate_batch_size(
+    available_memory=get_gpu_memory(),
+    image_size=image.shape,
+    model_size=MODEL_MEMORY_REQUIREMENTS
+)
+
+# Model caching to avoid reload overhead
+@lru_cache(maxsize=2)
+def get_model(model_type: str):
+    return load_model(model_type)
+```
+
+### Decision 6: Backward Compatibility
+**What**: Maintain existing API while adding new capabilities
+
+**How**:
+- Existing endpoints continue working unchanged
+- New `processing_track` parameter is optional
+- Output format compatible with current consumers
+- Gradual migration path for clients
+
+## Risks / Trade-offs
+
+### Risk 1: Mixed Content Documents
+**Risk**: Documents with both scanned and digital pages
+**Mitigation**:
+- Page-level track detection as fallback
+- Confidence scoring to identify uncertain pages
+- Manual override option via API
+
+### Risk 2: Direct Extraction Quality
+**Risk**: Some PDFs have poor internal structure
+**Mitigation**:
+- Fallback to OCR track if extraction quality is low
+- Quality metrics: text density, structure coherence
+- User-reportable quality issues
+
+### Risk 3: Memory Pressure
+**Risk**: RTX 4060 8GB limitation with concurrent requests
+**Mitigation**:
+- Request queuing system
+- Dynamic batch adjustment
+- CPU fallback for overflow
+
+### Trade-off 1: Processing Time vs Accuracy
+- Direct extraction: Fast but depends on PDF quality
+- OCR: Slower but consistent quality
+- **Decision**: Prioritize speed for editable PDFs, accuracy for scanned
+
+### Trade-off 2: Complexity vs Flexibility
+- Two tracks increase system complexity
+- But enable optimal processing per document type
+- **Decision**: Accept complexity for 10x+ performance gains
+
+## Migration Plan
+
+### Phase 1: Infrastructure (Week 1-2)
+1. Deploy UnifiedDocument model
+2. Implement DocumentTypeDetector
+3. Add DirectExtractionEngine
+4. Update logging and monitoring
+
+### Phase 2: Integration (Week 3)
+1. Update OCR service with routing logic
+2. Modify PDF generator for unified model
+3. Add new API endpoints
+4. Deploy to staging
+
+### Phase 3: Validation (Week 4)
+1. A/B testing with subset of traffic
+2. Performance benchmarking
+3. Quality validation
+4. Client integration testing
+
+### Rollback Plan
+1. Feature flag to disable dual-track
+2. Fallback all requests to OCR track
+3. Maintain old code paths during transition
+4. Database migration reversible
+
+## Open Questions
+
+### Resolved
+- Q: Should we support page-level track mixing?
+  - A: No, adds complexity with minimal benefit. Document-level is sufficient.
+
+- Q: How to handle Office documents?
+  - A: OCR track initially, consider python-docx/openpyxl later if needed.
+
+### Pending
+- Q: What translation services to integrate with?
+  - Needs stakeholder input on cost/quality trade-offs
+
+- Q: Should we cache extracted text for repeated processing?
+  - Depends on storage costs vs reprocessing frequency
+
+- Q: How to handle password-protected PDFs?
+  - May need API parameter for passwords
+
+## Performance Targets
+
+### Direct Extraction Track
+- Latency: <500ms per page
+- Throughput: 100+ pages/minute
+- Memory: <500MB per document
+
+### OCR Track (Optimized)
+- Latency: 2-5s per page (GPU)
+- Throughput: 20-30 pages/minute
+- Memory: <2GB per batch
+
+### API Response Times
+- Document type detection: <100ms
+- Processing initiation: <200ms
+- Result retrieval: <100ms
+
+## Technical Dependencies
+
+### Python Packages
+```python
+# Direct extraction
+PyMuPDF==1.23.x
+pdfplumber==0.10.x  # Fallback/validation
+python-magic-bin==0.4.x
+
+# OCR enhancement
+paddlepaddle-gpu==2.5.2
+paddleocr==2.7.3
+
+# Infrastructure
+pydantic==2.x
+fastapi==0.100+
+redis==5.x  # For caching
+```
+
+### System Requirements
+- CUDA 11.8+ for PaddlePaddle
+- libmagic for file detection
+- 16GB RAM minimum
+- 50GB disk for models and cache
--- a/openspec/changes/dual-track-document-processing/proposal.md
+++ b/openspec/changes/dual-track-document-processing/proposal.md
@@ -0,0 +1,35 @@
+# Change: Dual-track Document Processing with Structure-Preserving Translation
+
+## Why
+
+The current system processes all documents through PaddleOCR, causing unnecessary overhead for editable PDFs that already contain extractable text. Additionally, we're only using ~20% of PP-StructureV3's capabilities, missing out on comprehensive document structure extraction. The system needs to support structure-preserving document translation as a future goal.
+
+## What Changes
+
+- **ADDED** Dual-track processing architecture with intelligent routing
+  - OCR track for scanned documents, images, and Office files using PaddleOCR
+  - Direct extraction track for editable PDFs using PyMuPDF
+- **ADDED** UnifiedDocument model as common output format for both tracks
+- **ADDED** DocumentTypeDetector service for automatic track selection
+- **MODIFIED** OCR service to use PP-StructureV3's parsing_res_list instead of markdown
+  - Now extracts all 23 element types with bbox coordinates
+  - Preserves reading order and hierarchical structure
+- **MODIFIED** PDF generator to handle UnifiedDocument format
+  - Enhanced overlap detection to prevent text/image/table collisions
+  - Improved coordinate transformation for accurate layout
+- **ADDED** Foundation for structure-preserving translation system
+- **BREAKING** JSON output structure will include new fields (backward compatible with defaults)
+
+## Impact
+
+- **Affected specs**:
+  - `document-processing` (new capability)
+  - `result-export` (enhanced with track metadata and structure data)
+  - `task-management` (tracks processing route and history)
+- **Affected code**:
+  - `backend/app/services/ocr_service.py` - Major refactoring for dual-track
+  - `backend/app/services/pdf_generator_service.py` - UnifiedDocument support
+  - `backend/app/api/v2/tasks.py` - New endpoints for track detection
+  - `frontend/src/pages/TaskDetailPage.tsx` - Display processing track info
+- **Performance**: 5-10x faster for editable PDFs, same speed for scanned documents
+- **Dependencies**: Adds PyMuPDF, pdfplumber, python-magic-bin
--- a/openspec/changes/dual-track-document-processing/specs/document-processing/spec.md
+++ b/openspec/changes/dual-track-document-processing/specs/document-processing/spec.md
@@ -0,0 +1,108 @@
+# Document Processing Spec Delta
+
+## ADDED Requirements
+
+### Requirement: Dual-track Processing
+The system SHALL support two distinct processing tracks for documents: OCR track for scanned/image documents and Direct extraction track for editable PDFs.
+
+#### Scenario: Process scanned PDF through OCR track
+- **WHEN** a scanned PDF is uploaded
+- **THEN** the system SHALL detect it requires OCR
+- **AND** route it through PaddleOCR PP-StructureV3 pipeline
+- **AND** return results in UnifiedDocument format
+
+#### Scenario: Process editable PDF through direct extraction
+- **WHEN** an editable PDF with extractable text is uploaded
+- **THEN** the system SHALL detect it can be directly extracted
+- **AND** route it through PyMuPDF extraction pipeline
+- **AND** return results in UnifiedDocument format without OCR
+
+#### Scenario: Auto-detect processing track
+- **WHEN** a document is uploaded without explicit track specification
+- **THEN** the system SHALL analyze the document type and content
+- **AND** automatically select the optimal processing track
+- **AND** include the selected track in processing metadata
+
+### Requirement: Document Type Detection
+The system SHALL provide intelligent document type detection to determine the optimal processing track.
+
+#### Scenario: Detect editable PDF
+- **WHEN** analyzing a PDF document
+- **THEN** the system SHALL check for extractable text content
+- **AND** return confidence score for editability
+- **AND** recommend "direct" track if text coverage > 90%
+
+#### Scenario: Detect scanned document
+- **WHEN** analyzing an image or scanned PDF
+- **THEN** the system SHALL identify lack of extractable text
+- **AND** recommend "ocr" track for processing
+- **AND** configure appropriate OCR models
+
+#### Scenario: Detect Office documents
+- **WHEN** analyzing .docx, .xlsx, .pptx files
+- **THEN** the system SHALL identify Office format
+- **AND** route to OCR track for initial implementation
+- **AND** preserve option for future direct Office extraction
+
+### Requirement: Unified Document Model
+The system SHALL use a standardized UnifiedDocument model as the common output format for both processing tracks.
+
+#### Scenario: Generate UnifiedDocument from OCR
+- **WHEN** OCR processing completes
+- **THEN** the system SHALL convert PP-StructureV3 results to UnifiedDocument
+- **AND** preserve all element types, coordinates, and confidence scores
+- **AND** maintain reading order and hierarchical structure
+
+#### Scenario: Generate UnifiedDocument from direct extraction
+- **WHEN** direct extraction completes
+- **THEN** the system SHALL convert PyMuPDF results to UnifiedDocument
+- **AND** preserve text styling, fonts, and exact positioning
+- **AND** extract tables with cell boundaries and content
+
+#### Scenario: Consistent output regardless of track
+- **WHEN** processing completes through either track
+- **THEN** the output SHALL conform to UnifiedDocument schema
+- **AND** include processing_track metadata field
+- **AND** support identical downstream operations (PDF generation, translation)
+
+### Requirement: Enhanced OCR with Full PP-StructureV3
+The system SHALL utilize the full capabilities of PP-StructureV3, extracting all 23 element types from parsing_res_list.
+
+#### Scenario: Extract comprehensive document structure
+- **WHEN** processing through OCR track
+- **THEN** the system SHALL use page_result.json['parsing_res_list']
+- **AND** extract all element types including headers, lists, tables, figures
+- **AND** preserve layout_bbox coordinates for each element
+
+#### Scenario: Maintain reading order
+- **WHEN** extracting elements from PP-StructureV3
+- **THEN** the system SHALL preserve the reading order from parsing_res_list
+- **AND** assign sequential indices to elements
+- **AND** support reordering for complex layouts
+
+#### Scenario: Extract table structure
+- **WHEN** PP-StructureV3 identifies a table
+- **THEN** the system SHALL extract cell content and boundaries
+- **AND** preserve table HTML for structure
+- **AND** extract plain text for translation
+
+### Requirement: Structure-Preserving Translation Foundation
+The system SHALL maintain document structure and layout information to support future translation features.
+
+#### Scenario: Preserve coordinates for translation
+- **WHEN** processing any document
+- **THEN** the system SHALL retain bbox coordinates for all text elements
+- **AND** calculate space requirements for text expansion/contraction
+- **AND** maintain element relationships and groupings
+
+#### Scenario: Extract translatable content
+- **WHEN** processing tables and lists
+- **THEN** the system SHALL extract plain text content
+- **AND** maintain mapping to original structure
+- **AND** preserve formatting markers for reconstruction
+
+#### Scenario: Support layout adjustment
+- **WHEN** preparing for translation
+- **THEN** the system SHALL identify flexible vs fixed layout regions
+- **AND** calculate maximum text expansion ratios
+- **AND** preserve non-translatable elements (logos, signatures)
--- a/openspec/changes/dual-track-document-processing/specs/result-export/spec.md
+++ b/openspec/changes/dual-track-document-processing/specs/result-export/spec.md
@@ -0,0 +1,74 @@
+# Result Export Spec Delta
+
+## MODIFIED Requirements
+
+### Requirement: Export Interface
+The Export page SHALL support downloading OCR results in multiple formats using V2 task APIs, with processing track information and enhanced structure data.
+
+#### Scenario: Export page uses V2 download endpoints
+- **WHEN** user selects a format and clicks export button
+- **THEN** frontend SHALL call V2 endpoint `/api/v2/tasks/{task_id}/download/{format}`
+- **AND** frontend SHALL NOT call V1 `/api/v2/export` endpoint (which returns 404)
+- **AND** file SHALL download successfully
+
+#### Scenario: Export supports multiple formats
+- **WHEN** user exports a completed task
+- **THEN** system SHALL support downloading as TXT, JSON, Excel, Markdown, and PDF
+- **AND** each format SHALL use correct V2 download endpoint
+- **AND** downloaded files SHALL contain task OCR results
+
+#### Scenario: Export includes processing track metadata
+- **WHEN** user exports a task processed through dual-track system
+- **THEN** exported JSON SHALL include "processing_track" field indicating "ocr" or "direct"
+- **AND** SHALL include "processing_metadata" with track-specific information
+- **AND** SHALL maintain backward compatibility for clients not expecting these fields
+
+#### Scenario: Export UnifiedDocument format
+- **WHEN** user requests JSON export with unified=true parameter
+- **THEN** system SHALL return UnifiedDocument structure
+- **AND** include complete element hierarchy with coordinates
+- **AND** preserve all PP-StructureV3 element types for OCR track
+
+## ADDED Requirements
+
+### Requirement: Enhanced PDF Export with Layout Preservation
+The PDF export SHALL accurately preserve document layout from both OCR and direct extraction tracks.
+
+#### Scenario: Export PDF from direct extraction track
+- **WHEN** exporting PDF from a direct-extraction processed document
+- **THEN** the PDF SHALL maintain exact text positioning from source
+- **AND** preserve original fonts and styles where possible
+- **AND** include extracted images at correct positions
+
+#### Scenario: Export PDF from OCR track with full structure
+- **WHEN** exporting PDF from OCR-processed document
+- **THEN** the PDF SHALL use all 23 PP-StructureV3 element types
+- **AND** render tables with proper cell boundaries
+- **AND** maintain reading order from parsing_res_list
+
+#### Scenario: Handle coordinate transformations
+- **WHEN** generating PDF from UnifiedDocument
+- **THEN** system SHALL correctly transform bbox coordinates to PDF space
+- **AND** handle page size variations
+- **AND** prevent text overlap using enhanced overlap detection
+
+### Requirement: Structure Data Export
+The system SHALL provide export formats that preserve document structure for downstream processing.
+
+#### Scenario: Export structured JSON with hierarchy
+- **WHEN** user selects structured JSON format
+- **THEN** export SHALL include element hierarchy and relationships
+- **AND** preserve parent-child relationships (sections, lists)
+- **AND** include style and formatting information
+
+#### Scenario: Export for translation preparation
+- **WHEN** user exports with translation_ready=true parameter
+- **THEN** export SHALL include translatable text segments
+- **AND** maintain coordinate mappings for each segment
+- **AND** mark non-translatable regions
+
+#### Scenario: Export with layout analysis
+- **WHEN** user requests layout analysis export
+- **THEN** system SHALL include reading order indices
+- **AND** identify layout regions (header, body, footer, sidebar)
+- **AND** provide confidence scores for layout detection
--- a/openspec/changes/dual-track-document-processing/specs/task-management/spec.md
+++ b/openspec/changes/dual-track-document-processing/specs/task-management/spec.md
@@ -0,0 +1,105 @@
+# Task Management Spec Delta
+
+## MODIFIED Requirements
+
+### Requirement: Task Result Generation
+The OCR service SHALL generate both JSON and Markdown result files for completed tasks with actual content, including processing track information and enhanced structure data.
+
+#### Scenario: Markdown file contains OCR results
+- **WHEN** a task completes OCR processing successfully
+- **THEN** the generated `.md` file SHALL contain the extracted text in markdown format
+- **AND** the file size SHALL be greater than 0 bytes
+- **AND** the markdown SHALL include headings, paragraphs, and formatting based on OCR layout detection
+
+#### Scenario: Result files stored in task directory
+- **WHEN** OCR processing completes for task ID `88c6c2d2-37e1-48fd-a50f-406142987bdf`
+- **THEN** result files SHALL be stored in `storage/results/88c6c2d2-37e1-48fd-a50f-406142987bdf/`
+- **AND** both `<filename>_result.json` and `<filename>_result.md` SHALL exist
+- **AND** both files SHALL contain valid OCR output data
+
+#### Scenario: Include processing track in results
+- **WHEN** a task completes through dual-track processing
+- **THEN** the JSON result SHALL include "processing_track" field
+- **AND** SHALL indicate whether "ocr" or "direct" track was used
+- **AND** SHALL include track-specific metadata (confidence for OCR, extraction quality for direct)
+
+#### Scenario: Store UnifiedDocument format
+- **WHEN** processing completes through either track
+- **THEN** system SHALL save results in UnifiedDocument format
+- **AND** maintain backward-compatible JSON structure
+- **AND** include enhanced structure from PP-StructureV3 or PyMuPDF
+
+### Requirement: Task Detail View
+The frontend SHALL provide a dedicated page for viewing individual task details with processing track information and enhanced preview capabilities.
+
+#### Scenario: Navigate to task detail page
+- **WHEN** user clicks "View Details" button on task in Task History page
+- **THEN** browser SHALL navigate to `/tasks/{task_id}`
+- **AND** TaskDetailPage component SHALL render
+
+#### Scenario: Display task information
+- **WHEN** TaskDetailPage loads for a valid task ID
+- **THEN** page SHALL display task metadata (filename, status, processing time, confidence)
+- **AND** page SHALL show markdown preview of OCR results
+- **AND** page SHALL provide download buttons for JSON, Markdown, and PDF formats
+
+#### Scenario: Download from task detail page
+- **WHEN** user clicks download button for a specific format
+- **THEN** browser SHALL download the file using `/api/v2/tasks/{task_id}/download/{format}` endpoint
+- **AND** downloaded file SHALL contain the task's OCR results in requested format
+
+#### Scenario: Display processing track information
+- **WHEN** viewing task processed through dual-track system
+- **THEN** page SHALL display processing track used (OCR or Direct)
+- **AND** show track-specific metrics (OCR confidence or extraction quality)
+- **AND** provide option to reprocess with alternate track if applicable
+
+#### Scenario: Preview document structure
+- **WHEN** user enables structure view
+- **THEN** page SHALL display document element hierarchy
+- **AND** show bounding boxes overlay on preview
+- **AND** highlight different element types (headers, tables, lists) with distinct colors
+
+## ADDED Requirements
+
+### Requirement: Processing Track Management
+The task management system SHALL track and display processing track information for all tasks.
+
+#### Scenario: Track processing route selection
+- **WHEN** a task begins processing
+- **THEN** system SHALL record the selected processing track
+- **AND** log the reason for track selection
+- **AND** store auto-detection confidence score
+
+#### Scenario: Allow track override
+- **WHEN** user views a completed task
+- **THEN** system SHALL offer option to reprocess with different track
+- **AND** maintain both results for comparison
+- **AND** track which result user prefers
+
+#### Scenario: Display processing metrics
+- **WHEN** task completes processing
+- **THEN** system SHALL record track-specific metrics
+- **AND** OCR track SHALL show confidence scores and character count
+- **AND** Direct track SHALL show extraction coverage and structure quality
+
+### Requirement: Task Processing History
+The system SHALL maintain detailed processing history for tasks including track changes and reprocessing.
+
+#### Scenario: Record reprocessing attempts
+- **WHEN** a task is reprocessed with different track
+- **THEN** system SHALL maintain processing history
+- **AND** store results from each attempt
+- **AND** allow comparison between different processing attempts
+
+#### Scenario: Track quality improvements
+- **WHEN** viewing task history
+- **THEN** system SHALL show quality metrics over time
+- **AND** indicate if reprocessing improved results
+- **AND** suggest optimal track based on document characteristics
+
+#### Scenario: Export processing analytics
+- **WHEN** exporting task data
+- **THEN** system SHALL include processing history
+- **AND** provide track selection statistics
+- **AND** include performance metrics for each processing attempt
--- a/openspec/changes/dual-track-document-processing/tasks.md
+++ b/openspec/changes/dual-track-document-processing/tasks.md
@@ -0,0 +1,170 @@
+# Implementation Tasks: Dual-track Document Processing
+
+## 1. Core Infrastructure
+- [ ] 1.1 Add PyMuPDF and other dependencies to requirements.txt
+  - [ ] 1.1.1 Add PyMuPDF==1.23.x
+  - [ ] 1.1.2 Add pdfplumber==0.10.x
+  - [ ] 1.1.3 Add python-magic-bin==0.4.x
+  - [ ] 1.1.4 Test dependency installation
+- [ ] 1.2 Create UnifiedDocument model in backend/app/models/
+  - [ ] 1.2.1 Define UnifiedDocument dataclass
+  - [ ] 1.2.2 Add DocumentElement model
+  - [ ] 1.2.3 Add DocumentMetadata model
+  - [ ] 1.2.4 Create converters for both OCR and direct extraction outputs
+- [ ] 1.3 Create DocumentTypeDetector service
+  - [ ] 1.3.1 Implement file type detection using python-magic
+  - [ ] 1.3.2 Add PDF editability checking logic
+  - [ ] 1.3.3 Add Office document detection
+  - [ ] 1.3.4 Create routing logic to determine processing track
+  - [ ] 1.3.5 Add unit tests for detector
+
+## 2. Direct Extraction Track
+- [ ] 2.1 Create DirectExtractionEngine service
+  - [ ] 2.1.1 Implement PyMuPDF-based text extraction
+  - [ ] 2.1.2 Add structure preservation logic
+  - [ ] 2.1.3 Extract tables with coordinates
+  - [ ] 2.1.4 Extract images and their positions
+  - [ ] 2.1.5 Maintain reading order
+  - [ ] 2.1.6 Handle multi-column layouts
+- [ ] 2.2 Implement layout analysis for editable PDFs
+  - [ ] 2.2.1 Detect headers and footers
+  - [ ] 2.2.2 Identify sections and subsections
+  - [ ] 2.2.3 Parse lists and nested structures
+  - [ ] 2.2.4 Extract font and style information
+- [ ] 2.3 Create direct extraction to UnifiedDocument converter
+  - [ ] 2.3.1 Map PyMuPDF structures to UnifiedDocument
+  - [ ] 2.3.2 Preserve coordinate information
+  - [ ] 2.3.3 Maintain element relationships
+
+## 3. OCR Track Enhancement
+- [ ] 3.1 Upgrade PP-StructureV3 configuration
+  - [ ] 3.1.1 Update config for RTX 4060 8GB optimization
+  - [ ] 3.1.2 Enable batch processing for GPU efficiency
+  - [ ] 3.1.3 Configure memory management settings
+  - [ ] 3.1.4 Set up model caching
+- [ ] 3.2 Enhance OCR service to use parsing_res_list
+  - [ ] 3.2.1 Replace markdown extraction with parsing_res_list
+  - [ ] 3.2.2 Extract all 23 element types
+  - [ ] 3.2.3 Preserve bbox coordinates from PP-StructureV3
+  - [ ] 3.2.4 Maintain reading order information
+- [ ] 3.3 Create OCR to UnifiedDocument converter
+  - [ ] 3.3.1 Map PP-StructureV3 elements to UnifiedDocument
+  - [ ] 3.3.2 Handle complex nested structures
+  - [ ] 3.3.3 Preserve all metadata
+
+## 4. Unified Processing Pipeline
+- [ ] 4.1 Update main OCR service for dual-track processing
+  - [ ] 4.1.1 Integrate DocumentTypeDetector
+  - [ ] 4.1.2 Route to appropriate processing engine
+  - [ ] 4.1.3 Return UnifiedDocument from both tracks
+  - [ ] 4.1.4 Maintain backward compatibility
+- [ ] 4.2 Create unified JSON export
+  - [ ] 4.2.1 Define standardized JSON schema
+  - [ ] 4.2.2 Include processing metadata
+  - [ ] 4.2.3 Support both track outputs
+- [ ] 4.3 Update PDF generator for UnifiedDocument
+  - [ ] 4.3.1 Adapt PDF generation to use UnifiedDocument
+  - [ ] 4.3.2 Preserve layout from both tracks
+  - [ ] 4.3.3 Handle coordinate transformations
+
+## 5. Translation System Foundation
+- [ ] 5.1 Create TranslationEngine interface
+  - [ ] 5.1.1 Define translation API contract
+  - [ ] 5.1.2 Support element-level translation
+  - [ ] 5.1.3 Preserve formatting markers
+- [ ] 5.2 Implement structure-preserving translation
+  - [ ] 5.2.1 Translate text while maintaining coordinates
+  - [ ] 5.2.2 Handle table cell translations
+  - [ ] 5.2.3 Preserve list structures
+  - [ ] 5.2.4 Maintain header hierarchies
+- [ ] 5.3 Create translated document renderer
+  - [ ] 5.3.1 Generate PDF with translated text
+  - [ ] 5.3.2 Adjust layouts for text expansion/contraction
+  - [ ] 5.3.3 Handle font substitution for target languages
+
+## 6. API Updates
+- [ ] 6.1 Update OCR endpoints
+  - [ ] 6.1.1 Add processing_track parameter
+  - [ ] 6.1.2 Support track auto-detection
+  - [ ] 6.1.3 Return processing metadata
+- [ ] 6.2 Add document type detection endpoint
+  - [ ] 6.2.1 Create /analyze endpoint
+  - [ ] 6.2.2 Return recommended processing track
+  - [ ] 6.2.3 Provide confidence scores
+- [ ] 6.3 Update result export endpoints
+  - [ ] 6.3.1 Support UnifiedDocument format
+  - [ ] 6.3.2 Add format conversion options
+  - [ ] 6.3.3 Include processing track information
+
+## 7. Frontend Updates
+- [ ] 7.1 Update task detail view
+  - [ ] 7.1.1 Display processing track information
+  - [ ] 7.1.2 Show track-specific metadata
+  - [ ] 7.1.3 Add track selection UI (if manual override needed)
+- [ ] 7.2 Update results preview
+  - [ ] 7.2.1 Handle UnifiedDocument format
+  - [ ] 7.2.2 Display enhanced structure information
+  - [ ] 7.2.3 Show coordinate overlays (debug mode)
+- [ ] 7.3 Add translation UI preparation
+  - [ ] 7.3.1 Add translation toggle/button
+  - [ ] 7.3.2 Language selection dropdown
+  - [ ] 7.3.3 Translation progress indicator
+
+## 8. Testing
+- [ ] 8.1 Unit tests for DocumentTypeDetector
+  - [ ] 8.1.1 Test various file types
+  - [ ] 8.1.2 Test editability detection
+  - [ ] 8.1.3 Test edge cases
+- [ ] 8.2 Unit tests for DirectExtractionEngine
+  - [ ] 8.2.1 Test text extraction accuracy
+  - [ ] 8.2.2 Test structure preservation
+  - [ ] 8.2.3 Test coordinate extraction
+- [ ] 8.3 Integration tests for dual-track processing
+  - [ ] 8.3.1 Test routing logic
+  - [ ] 8.3.2 Test UnifiedDocument generation
+  - [ ] 8.3.3 Test backward compatibility
+- [ ] 8.4 End-to-end tests
+  - [ ] 8.4.1 Test scanned PDF processing (OCR track)
+  - [ ] 8.4.2 Test editable PDF processing (direct track)
+  - [ ] 8.4.3 Test Office document processing
+  - [ ] 8.4.4 Test image file processing
+- [ ] 8.5 Performance testing
+  - [ ] 8.5.1 Benchmark both processing tracks
+  - [ ] 8.5.2 Test GPU memory usage
+  - [ ] 8.5.3 Compare processing times
+
+## 9. Documentation
+- [ ] 9.1 Update API documentation
+  - [ ] 9.1.1 Document new endpoints
+  - [ ] 9.1.2 Update existing endpoint docs
+  - [ ] 9.1.3 Add processing track information
+- [ ] 9.2 Create architecture documentation
+  - [ ] 9.2.1 Document dual-track flow
+  - [ ] 9.2.2 Explain UnifiedDocument structure
+  - [ ] 9.2.3 Add decision trees for track selection
+- [ ] 9.3 Add deployment guide
+  - [ ] 9.3.1 Document GPU requirements
+  - [ ] 9.3.2 Add environment configuration
+  - [ ] 9.3.3 Include troubleshooting guide
+
+## 10. Deployment Preparation
+- [ ] 10.1 Update Docker configuration
+  - [ ] 10.1.1 Add new dependencies to Dockerfile
+  - [ ] 10.1.2 Configure GPU support
+  - [ ] 10.1.3 Update volume mappings
+- [ ] 10.2 Update environment variables
+  - [ ] 10.2.1 Add processing track settings
+  - [ ] 10.2.2 Configure GPU memory limits
+  - [ ] 10.2.3 Add feature flags
+- [ ] 10.3 Create migration plan
+  - [ ] 10.3.1 Plan for existing data migration
+  - [ ] 10.3.2 Create rollback procedures
+  - [ ] 10.3.3 Document breaking changes
+
+## Completion Checklist
+- [ ] All unit tests passing
+- [ ] Integration tests passing
+- [ ] Performance benchmarks acceptable
+- [ ] Documentation complete
+- [ ] Code reviewed
+- [ ] Deployment tested in staging
--- a/openspec/changes/migrate-to-external-api-authentication/test_external_auth.py
+++ b/openspec/changes/migrate-to-external-api-authentication/test_external_auth.py
@@ -1,226 +0,0 @@
-#!/usr/bin/env python3
-"""
-Proof of Concept: External API Authentication Test
-Tests the external authentication API at https://pj-auth-api.vercel.app
-"""
-
-import asyncio
-import json
-from datetime import datetime
-from typing import Dict, Any, Optional
-import httpx
-from pydantic import BaseModel, Field
-
-
-class UserInfo(BaseModel):
-    """User information from external API"""
-    id: str
-    name: str
-    email: str
-    job_title: Optional[str] = Field(None, alias="jobTitle")
-    office_location: Optional[str] = Field(None, alias="officeLocation")
-    business_phones: list[str] = Field(default_factory=list, alias="businessPhones")
-
-
-class AuthSuccessData(BaseModel):
-    """Successful authentication response data"""
-    access_token: str
-    id_token: str
-    expires_in: int
-    token_type: str
-    user_info: UserInfo = Field(alias="userInfo")
-    issued_at: str = Field(alias="issuedAt")
-    expires_at: str = Field(alias="expiresAt")
-
-
-class AuthSuccessResponse(BaseModel):
-    """Successful authentication response"""
-    success: bool
-    message: str
-    data: AuthSuccessData
-    timestamp: str
-
-
-class AuthErrorResponse(BaseModel):
-    """Failed authentication response"""
-    success: bool
-    error: str
-    code: str
-    timestamp: str
-
-
-class ExternalAuthClient:
-    """Client for external authentication API"""
-
-    def __init__(self, base_url: str = "https://pj-auth-api.vercel.app", timeout: int = 30):
-        self.base_url = base_url
-        self.timeout = timeout
-        self.endpoint = "/api/auth/login"
-
-    async def authenticate(self, username: str, password: str) -> Dict[str, Any]:
-        """
-        Authenticate user with external API
-
-        Args:
-            username: User email/username
-            password: User password
-
-        Returns:
-            Authentication result dictionary
-        """
-        url = f"{self.base_url}{self.endpoint}"
-
-        print(f"ℹ Endpoint: POST {url}")
-        print(f"ℹ Username: {username}")
-        print(f"ℹ Timestamp: {datetime.now().isoformat()}")
-        print()
-
-        async with httpx.AsyncClient() as client:
-            try:
-                # Make authentication request
-                start_time = datetime.now()
-                response = await client.post(
-                    url,
-                    json={"username": username, "password": password},
-                    timeout=self.timeout
-                )
-                elapsed = (datetime.now() - start_time).total_seconds()
-
-                # Print response details
-                print("Response Details:")
-                print(f"  Status Code: {response.status_code}")
-                print(f"  Response Time: {elapsed:.3f}s")
-                print(f"  Content-Type: {response.headers.get('content-type', 'N/A')}")
-                print()
-
-                # Parse response
-                response_data = response.json()
-                print("Response Body:")
-                print(json.dumps(response_data, indent=2, ensure_ascii=False))
-                print()
-
-                # Handle success/failure
-                if response.status_code == 200:
-                    auth_response = AuthSuccessResponse(**response_data)
-                    return {
-                        "success": True,
-                        "status_code": response.status_code,
-                        "data": auth_response.dict(),
-                        "user_display_name": auth_response.data.user_info.name,
-                        "user_email": auth_response.data.user_info.email,
-                        "token": auth_response.data.access_token,
-                        "expires_in": auth_response.data.expires_in,
-                        "expires_at": auth_response.data.expires_at
-                    }
-                elif response.status_code == 401:
-                    error_response = AuthErrorResponse(**response_data)
-                    return {
-                        "success": False,
-                        "status_code": response.status_code,
-                        "error": error_response.error,
-                        "code": error_response.code
-                    }
-                else:
-                    return {
-                        "success": False,
-                        "status_code": response.status_code,
-                        "error": f"Unexpected status code: {response.status_code}",
-                        "response": response_data
-                    }
-
-            except httpx.TimeoutException:
-                print(f"❌ Request timeout after {self.timeout} seconds")
-                return {
-                    "success": False,
-                    "error": "Request timeout",
-                    "code": "TIMEOUT"
-                }
-            except httpx.RequestError as e:
-                print(f"❌ Request error: {e}")
-                return {
-                    "success": False,
-                    "error": str(e),
-                    "code": "REQUEST_ERROR"
-                }
-            except Exception as e:
-                print(f"❌ Unexpected error: {e}")
-                return {
-                    "success": False,
-                    "error": str(e),
-                    "code": "UNKNOWN_ERROR"
-                }
-
-
-async def test_authentication():
-    """Test authentication with different scenarios"""
-    client = ExternalAuthClient()
-
-    # Test scenarios
-    test_cases = [
-        {
-            "name": "Valid Credentials (Example)",
-            "username": "ymirliu@panjit.com.tw",
-            "password": "correct_password",  # Replace with actual password for testing
-            "expected": "success"
-        },
-        {
-            "name": "Invalid Credentials",
-            "username": "test@example.com",
-            "password": "wrong_password",
-            "expected": "failure"
-        }
-    ]
-
-    for i, test_case in enumerate(test_cases, 1):
-        print(f"{'='*60}")
-        print(f"Test Case {i}: {test_case['name']}")
-        print(f"{'='*60}")
-
-        result = await client.authenticate(
-            username=test_case["username"],
-            password=test_case["password"]
-        )
-
-        # Analyze result
-        print("\nAnalysis:")
-        if result["success"]:
-            print("✅ Authentication successful")
-            print(f"  User: {result.get('user_display_name', 'N/A')}")
-            print(f"  Email: {result.get('user_email', 'N/A')}")
-            print(f"  Token expires in: {result.get('expires_in', 0)} seconds")
-            print(f"  Expires at: {result.get('expires_at', 'N/A')}")
-        else:
-            print("❌ Authentication failed")
-            print(f"  Error: {result.get('error', 'Unknown error')}")
-            print(f"  Code: {result.get('code', 'N/A')}")
-
-        print("\n")
-
-
-async def test_token_validation():
-    """Test token validation and refresh logic"""
-    # This would be implemented when we have a valid token
-    print("Token validation test - To be implemented with actual tokens")
-    pass
-
-
-def main():
-    """Main entry point"""
-    print("External Authentication API Test")
-    print("================================\n")
-
-    # Run tests
-    asyncio.run(test_authentication())
-
-    print("\nTest completed!")
-    print("\nNotes for implementation:")
-    print("1. Use httpx for async HTTP requests (already in requirements)")
-    print("2. Store tokens securely (consider encryption)")
-    print("3. Implement automatic token refresh before expiration")
-    print("4. Handle network failures with retry logic")
-    print("5. Map external user ID to local user records")
-    print("6. Display user 'name' field in UI instead of username")
-
-
-if __name__ == "__main__":
-    main()