# Design: AI Report Generation Architecture

## Overview

This document describes the architectural design for integrating DIFY AI service to generate incident reports from chat room data.

## System Architecture

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                           Frontend (React)                                   │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────────────────┐  │
│  │ Generate Button │  │ Progress Modal  │  │ Download Button             │  │
│  └────────┬────────┘  └────────▲────────┘  └──────────────┬──────────────┘  │
└───────────┼────────────────────┼──────────────────────────┼─────────────────┘
            │ POST /generate     │ WebSocket: progress      │ GET /download
            ▼                    │                          ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                         FastAPI Backend                                      │
│                                                                              │
│  ┌────────────────────────────────────────────────────────────────────────┐ │
│  │                    Report Generation Router                             │ │
│  │  POST /api/rooms/{room_id}/reports/generate                            │ │
│  │  GET  /api/rooms/{room_id}/reports                                     │ │
│  │  GET  /api/rooms/{room_id}/reports/{report_id}                         │ │
│  │  GET  /api/rooms/{room_id}/reports/{report_id}/download                │ │
│  └────────────────────────────────────────────────────────────────────────┘ │
│                              │                                               │
│                              ▼                                               │
│  ┌────────────────────────────────────────────────────────────────────────┐ │
│  │                    Report Generation Service                            │ │
│  │                                                                         │ │
│  │  1. ReportDataService.collect_room_data()                              │ │
│  │     ├── Get room metadata (title, type, severity, status)              │ │
│  │     ├── Get all messages (sorted by time)                              │ │
│  │     ├── Get member list (with roles)                                   │ │
│  │     └── Get file list (with metadata, not content)                     │ │
│  │                                                                         │ │
│  │  2. DifyService.generate_report_content()                              │ │
│  │     ├── Build prompt with system instructions + room data              │ │
│  │     ├── Call DIFY Chat API (blocking mode)                             │ │
│  │     ├── Parse JSON response                                            │ │
│  │     └── Validate against expected schema                               │ │
│  │                                                                         │ │
│  │  3. DocxAssemblyService.create_document()                              │ │
│  │     ├── Create docx with python-docx                                   │ │
│  │     ├── Add title, metadata header                                     │ │
│  │     ├── Add AI-generated sections (summary, timeline, etc.)            │ │
│  │     ├── Download images from MinIO                                     │ │
│  │     ├── Embed images in document                                       │ │
│  │     └── Add file attachment list                                       │ │
│  │                                                                         │ │
│  │  4. Store report metadata in database                                  │ │
│  │  5. Upload .docx to MinIO or store locally                             │ │
│  └────────────────────────────────────────────────────────────────────────┘ │
│                              │                                               │
└──────────────────────────────┼───────────────────────────────────────────────┘
                               │
            ┌──────────────────┼──────────────────┐
            ▼                  ▼                  ▼
    ┌──────────────┐   ┌──────────────┐   ┌──────────────┐
    │   DIFY API   │   │    MinIO     │   │  PostgreSQL  │
    │ Chat Messages│   │ File Storage │   │   Database   │
    └──────────────┘   └──────────────┘   └──────────────┘
```

## Data Flow

### 1. Data Collection Phase

```python
RoomReportData:
  room_id: str
  title: str
  incident_type: str
  severity: str
  status: str
  location: str
  description: str
  created_at: datetime
  resolved_at: datetime | None

  messages: List[MessageData]
    - sender_name: str
    - content: str
    - message_type: str
    - created_at: datetime
    - has_file_attachment: bool
    - file_name: str | None

  members: List[MemberData]
    - user_id: str
    - display_name: str
    - role: str

  files: List[FileData]
    - file_id: str
    - filename: str
    - file_type: str
    - mime_type: str
    - uploaded_at: datetime
    - uploader_name: str
```

### 2. DIFY Prompt Construction

```
System Prompt (在 DIFY 應用設定):
  - Role definition (專業報告撰寫助手)
  - Output format requirements (JSON only)
  - Report section definitions
  - JSON schema with examples

User Query (每次請求):
  ## 事件資訊
  - 標題: {room.title}
  - 類型: {room.incident_type}
  - 嚴重程度: {room.severity}
  - 狀態: {room.status}
  - 地點: {room.location}
  - 建立時間: {room.created_at}

  ## 參與人員
  {formatted member list}

  ## 對話記錄
  {formatted message timeline}

  ## 附件清單
  {formatted file list - names only}

  請根據以上資料生成報告 JSON。
```

### 3. DIFY API Request/Response

```python
# Request
POST https://dify.theaken.com/v1/chat-messages
Headers:
  Authorization: Bearer {DIFY_API_KEY}
  Content-Type: application/json

Body:
{
  "inputs": {},
  "query": "{constructed_prompt}",
  "response_mode": "blocking",
  "conversation_id": "",  # New conversation each time
  "user": "{room_id}"     # Use room_id for tracking
}

# Response
{
  "event": "message",
  "message_id": "...",
  "answer": "{...JSON report content...}",
  "metadata": {
    "usage": {...}
  }
}
```

### 4. AI Output JSON Schema

```json
{
  "summary": {
    "content": "string (50-100字事件摘要)"
  },
  "timeline": {
    "events": [
      {
        "time": "string (HH:MM or YYYY-MM-DD HH:MM)",
        "description": "string"
      }
    ]
  },
  "participants": {
    "members": [
      {
        "name": "string",
        "role": "string (事件發起人/維修負責人/etc.)"
      }
    ]
  },
  "resolution_process": {
    "content": "string (詳細處理過程)"
  },
  "current_status": {
    "status": "active|resolved|archived",
    "description": "string"
  },
  "final_resolution": {
    "has_resolution": "boolean",
    "content": "string (若 has_resolution=false 可為空)"
  }
}
```

## Module Structure

```
app/modules/report_generation/
├── __init__.py
├── models.py              # GeneratedReport SQLAlchemy model
├── schemas.py             # Pydantic schemas for API
├── router.py              # FastAPI endpoints
├── dependencies.py        # Auth and permission checks
├── prompts.py             # System prompt and prompt templates
└── services/
    ├── __init__.py
    ├── dify_service.py    # DIFY API client
    ├── report_data_service.py   # Collect room data
    └── docx_service.py    # python-docx assembly
```

## Database Schema

### Users Table (New - for display name resolution)

```sql
CREATE TABLE users (
    user_id VARCHAR(255) PRIMARY KEY,  -- email address (e.g., ymirliu@panjit.com.tw)
    display_name VARCHAR(255) NOT NULL, -- from AD API userInfo.name (e.g., "ymirliu 劉念蓉")
    office_location VARCHAR(100),       -- from AD API userInfo.officeLocation
    job_title VARCHAR(100),             -- from AD API userInfo.jobTitle
    last_login_at TIMESTAMP,            -- updated on each login
    created_at TIMESTAMP DEFAULT NOW(),

    INDEX ix_users_display_name (display_name)
);
```

**Population Strategy:**
- On successful login, auth module calls `upsert_user()` with AD API response data
- Uses `INSERT ... ON CONFLICT DO UPDATE` for atomic upsert
- `last_login_at` updated on every login

**Usage in Reports:**
```sql
SELECT m.content, m.created_at, u.display_name
FROM messages m
LEFT JOIN users u ON m.sender_id = u.user_id
WHERE m.room_id = ?
ORDER BY m.created_at;
```

### Generated Reports Table

```sql
CREATE TABLE generated_reports (
    report_id VARCHAR(36) PRIMARY KEY,
    room_id VARCHAR(36) NOT NULL REFERENCES incident_rooms(room_id),

    -- Generation metadata
    generated_by VARCHAR(255) NOT NULL,  -- User who triggered generation
    generated_at TIMESTAMP DEFAULT NOW(),

    -- Status tracking
    status VARCHAR(20) NOT NULL DEFAULT 'pending',  -- pending, generating, completed, failed
    error_message TEXT,

    -- AI metadata
    dify_message_id VARCHAR(100),
    dify_conversation_id VARCHAR(100),
    prompt_tokens INTEGER,
    completion_tokens INTEGER,

    -- Report storage
    report_title VARCHAR(255),
    report_json JSONB,        -- Parsed AI output
    docx_storage_path VARCHAR(500),  -- MinIO path or local path

    -- Indexes
    INDEX ix_generated_reports_room (room_id, generated_at DESC),
    INDEX ix_generated_reports_status (status)
);
```

## Configuration

```python
# app/core/config.py additions
class Settings(BaseSettings):
    # ... existing settings ...

    # DIFY AI Service
    DIFY_BASE_URL: str = "https://dify.theaken.com/v1"
    DIFY_API_KEY: str  # Required, from .env
    DIFY_TIMEOUT_SECONDS: int = 120  # AI generation can take time

    # Report Generation
    REPORT_MAX_MESSAGES: int = 200  # Summarize if exceeded
    REPORT_STORAGE_PATH: str = "reports"  # MinIO path prefix
```

## Error Handling Strategy

| Error Type | Handling |
|------------|----------|
| DIFY API timeout | Retry once, then fail with timeout error |
| DIFY returns non-JSON | Attempt to extract JSON from response, retry if fails |
| JSON schema validation fails | Log raw response, return error with details |
| MinIO image download fails | Skip image, add note in report |
| python-docx assembly fails | Return partial report or error |

## Security Considerations

- DIFY API key stored in environment variable, never logged
- Room membership verified before report generation
- Generated reports inherit room access permissions
- Report download URLs are direct (no presigned URLs needed as they're behind auth)

## Performance Considerations

- Report generation is async-friendly but runs in blocking mode for simplicity
- Large rooms: messages older than 7 days are summarized by day
- Images are downloaded in parallel using asyncio.gather
- Reports cached in database to avoid regeneration