Task_Reporter/openspec/specs/ai-report-generation/spec.md

# ai-report-generation Specification

## Purpose
TBD - created by archiving change add-ai-report-generation. Update Purpose after archive.
## Requirements
### Requirement: User Display Name Resolution

The system SHALL maintain a permanent `users` table to store user display names from AD authentication, enabling reports to show names instead of email addresses.

#### Scenario: Create user record on first login

- **GIVEN** user "ymirliu@panjit.com.tw" logs in for the first time
- **AND** the AD API returns userInfo with name "ymirliu 劉念蓉"
- **WHEN** authentication succeeds
- **THEN** the system SHALL create a new record in `users` table with:
  - user_id: "ymirliu@panjit.com.tw"
  - display_name: "ymirliu 劉念蓉"
  - office_location: "高雄" (from AD API)
  - job_title: null (from AD API)
  - last_login_at: current timestamp
  - created_at: current timestamp

#### Scenario: Update user record on subsequent login

- **GIVEN** user "ymirliu@panjit.com.tw" already exists in `users` table
- **AND** the user's display_name in AD has changed to "劉念蓉 Ymir"
- **WHEN** the user logs in again
- **THEN** the system SHALL update the existing record with:
  - display_name: "劉念蓉 Ymir"
  - last_login_at: current timestamp
- **AND** preserve the original created_at timestamp

#### Scenario: Resolve display name for report

- **GIVEN** a message was sent by "ymirliu@panjit.com.tw"
- **AND** the users table contains display_name "ymirliu 劉念蓉" for this user
- **WHEN** report data is collected
- **THEN** the system SHALL JOIN with users table
- **AND** return display_name "ymirliu 劉念蓉" instead of email address

#### Scenario: Handle unknown user gracefully

- **GIVEN** a message was sent by "olduser@panjit.com.tw"
- **AND** this user does not exist in the users table (never logged in to new system)
- **WHEN** report data is collected
- **THEN** the system SHALL use the email address as fallback display name
- **AND** format it as "olduser@panjit.com.tw" in the report

---

### Requirement: Report Data Collection

The system SHALL collect all relevant room data for AI processing, including messages, members, files with their conversation context, and room metadata.

#### Scenario: Collect complete room data for report generation

- **GIVEN** an incident room with ID `room-123` exists
- **AND** the room has 50 messages from 5 members
- **AND** the room has 3 uploaded files (2 images, 1 PDF)
- **WHEN** the report data service collects room data
- **THEN** the system SHALL return a structured data object containing:
  - Room metadata (title, incident_type, severity, status, location, description, timestamps)
  - All 50 messages sorted by created_at ascending
  - All 5 members with their roles (owner, editor, viewer)
  - All 3 files with metadata (filename, type, uploader, upload time) AND their associated message context
- **AND** messages SHALL include sender display name (not just user_id)
- **AND** file references in messages SHALL be annotated with surrounding context

#### Scenario: Include file context in report data
- **GIVEN** a file "defect_photo.jpg" was uploaded with the message "發現產品表面瑕疵"
- **AND** the previous message was "Line 3 溫度異常升高中"
- **AND** the next message was "已通知維修人員處理"
- **WHEN** report data is collected
- **THEN** the file entry SHALL include:
  ```json
  {
    "file_id": "...",
    "filename": "defect_photo.jpg",
    "uploader_display_name": "陳工程師",
    "uploaded_at": "2025-12-08T14:30:00+08:00",
    "caption": "發現產品表面瑕疵",
    "context_before": "Line 3 溫度異常升高中",
    "context_after": "已通知維修人員處理"
  }
  ```
- **AND** the AI prompt SHALL format files as:
  `[附件: defect_photo.jpg] - 上傳者: 陳工程師 (14:30), 說明: "發現產品表面瑕疵" (前文: "Line 3 溫度異常升高中")`

#### Scenario: Handle room with no messages

- **GIVEN** an incident room was just created with no messages
- **WHEN** report generation is requested
- **THEN** the system SHALL return an error indicating insufficient data for report generation
- **AND** the error message SHALL be "事件聊天室尚無訊息記錄，無法生成報告"

#### Scenario: Summarize large rooms exceeding message limit

- **GIVEN** an incident room has 500 messages spanning 5 days
- **AND** the REPORT_MAX_MESSAGES limit is 200
- **WHEN** report data is collected
- **THEN** the system SHALL keep the most recent 150 messages in full
- **AND** summarize older messages by day (e.g., "2025-12-01: 45 則訊息討論設備檢修")
- **AND** the total formatted content SHALL stay within token limits

### Requirement: DIFY AI Integration

The system SHALL integrate with DIFY Chat API to generate structured report content from collected room data.

#### Scenario: Successful report generation via DIFY

- **GIVEN** room data has been collected successfully
- **WHEN** the DIFY service is called with the formatted prompt
- **THEN** the system SHALL send a POST request to `{DIFY_BASE_URL}/chat-messages`
- **AND** include Authorization header with Bearer token
- **AND** set response_mode to "blocking"
- **AND** set user to the room_id for tracking
- **AND** parse the JSON from the `answer` field in the response
- **AND** validate the JSON structure matches expected schema

#### Scenario: DIFY returns invalid JSON

- **GIVEN** DIFY returns a response where `answer` is not valid JSON
- **WHEN** the system attempts to parse the response
- **THEN** the system SHALL attempt to extract JSON using regex patterns
- **AND** if extraction fails, retry the request once with a simplified prompt
- **AND** if retry fails, return error with status "failed" and store raw response for debugging

#### Scenario: DIFY API timeout

- **GIVEN** the DIFY API does not respond within DIFY_TIMEOUT_SECONDS (120s)
- **WHEN** the timeout is reached
- **THEN** the system SHALL cancel the request
- **AND** return error with message "AI 服務回應超時，請稍後再試"
- **AND** log the timeout event with room_id and request duration

#### Scenario: DIFY API authentication failure

- **GIVEN** the DIFY_API_KEY is invalid or expired
- **WHEN** the DIFY API returns 401 Unauthorized
- **THEN** the system SHALL return error with message "AI 服務認證失敗，請聯繫系統管理員"
- **AND** log the authentication failure (without exposing the key)

---

### Requirement: Document Assembly

The system SHALL assemble professional .docx documents from AI-generated content with embedded images from MinIO and file context from conversations.

#### Scenario: Generate complete report document

- **GIVEN** DIFY has returned valid JSON report content
- **AND** the room has 2 image attachments in MinIO
- **WHEN** the docx assembly service creates the document
- **THEN** the system SHALL create a .docx file with:
  - Report title: "生產線異常處理報告 - {room.title}"
  - Generation metadata: 生成時間, 事件編號, 生成者
  - Section 1: 事件摘要 (from AI summary.content)
  - Section 2: 事件時間軸 (formatted table from AI timeline.events)
  - Section 3: 參與人員 (formatted list from AI participants.members)
  - Section 4: 處理過程 (from AI resolution_process.content)
  - Section 5: 目前狀態 (from AI current_status)
  - Section 6: 最終處置結果 (from AI final_resolution, if has_resolution=true)
  - Section 7: 附件 (embedded images with captions + file list with context)
- **AND** images SHALL be embedded at appropriate size (max width 15cm)
- **AND** each image SHALL include its caption from the upload message
- **AND** document SHALL use professional formatting (標楷體 or similar)

#### Scenario: Handle missing images during assembly

- **GIVEN** a file reference exists in the database
- **BUT** the actual file is missing from MinIO
- **WHEN** the docx service attempts to embed the image
- **THEN** the system SHALL skip the missing image
- **AND** add a placeholder text: "[圖片無法載入: {filename}]"
- **AND** continue with document assembly
- **AND** log a warning with file_id and room_id

#### Scenario: Generate report for room without images

- **GIVEN** the room has no image attachments
- **WHEN** the docx assembly service creates the document
- **THEN** the system SHALL create a complete document without the embedded images section
- **AND** the attachments section SHALL show "本事件無附件檔案" if no files exist

### Requirement: Report Generation API

The system SHALL provide REST API endpoints for triggering report generation and downloading generated reports.

#### Scenario: Trigger report generation

- **GIVEN** user "supervisor@company.com" is a member of room "room-123"
- **AND** the room status is "resolved" or "archived"
- **WHEN** the user sends `POST /api/rooms/room-123/reports/generate`
- **THEN** the system SHALL create a new report record with status "generating"
- **AND** return immediately with report_id and status
- **AND** process the report generation asynchronously
- **AND** update status to "completed" when done

#### Scenario: Generate report for active room

- **GIVEN** user requests report for a room with status "active"
- **WHEN** the request is processed
- **THEN** the system SHALL allow generation with a warning
- **AND** include note in report: "注意：本報告生成時事件尚未結案"

#### Scenario: Download generated report

- **GIVEN** a report with ID "report-456" has status "completed"
- **AND** the report belongs to room "room-123"
- **WHEN** user sends `GET /api/rooms/room-123/reports/report-456/download`
- **THEN** the system SHALL return the .docx file
- **AND** set Content-Type to "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
- **AND** set Content-Disposition to "attachment; filename={report_title}_{date}.docx"

#### Scenario: List room reports

- **GIVEN** room "room-123" has 3 previously generated reports
- **WHEN** user sends `GET /api/rooms/room-123/reports`
- **THEN** the system SHALL return a list of reports with:
  - report_id
  - generated_at
  - generated_by
  - status
  - report_title
- **AND** results SHALL be sorted by generated_at descending

#### Scenario: Unauthorized report access

- **GIVEN** user "outsider@company.com" is NOT a member of room "room-123"
- **WHEN** the user attempts to generate or download a report
- **THEN** the system SHALL return 403 Forbidden
- **AND** the error message SHALL be "您沒有此事件的存取權限"

---

### Requirement: Report Generation Status and Notifications

The system SHALL track report generation status and notify users of completion via WebSocket.

#### Scenario: Track report generation progress

- **GIVEN** a report generation has been triggered
- **WHEN** the generation process runs
- **THEN** the system SHALL update report status through stages:
  - "pending" → initial state
  - "collecting_data" → gathering room data
  - "generating_content" → calling DIFY API
  - "assembling_document" → creating .docx
  - "completed" → finished successfully
  - "failed" → error occurred

#### Scenario: Notify via WebSocket on completion

- **GIVEN** user is connected to room WebSocket
- **AND** report generation completes successfully
- **WHEN** the status changes to "completed"
- **THEN** the system SHALL broadcast to room members:
  ```json
  {
    "type": "report_generated",
    "report_id": "report-456",
    "report_title": "生產線異常處理報告",
    "generated_by": "supervisor@company.com",
    "generated_at": "2025-12-04T16:30:00+08:00"
  }
  ```

#### Scenario: Notify on generation failure

- **GIVEN** report generation fails
- **WHEN** the status changes to "failed"
- **THEN** the system SHALL broadcast to the user who triggered generation:
  ```json
  {
    "type": "report_generation_failed",
    "report_id": "report-456",
    "error": "AI 服務回應超時，請稍後再試"
  }
  ```
- **AND** the error message SHALL be user-friendly (no technical details)

### Requirement: DIFY Service Health Check

The system SHALL provide a health check mechanism to verify DIFY AI service connectivity and configuration.

#### Scenario: Check DIFY configuration on startup
- **WHEN** the application starts
- **AND** `DIFY_API_KEY` is not configured
- **THEN** the system SHALL log a warning message: "DIFY_API_KEY not configured - AI report generation will be unavailable"

#### Scenario: DIFY health check endpoint
- **WHEN** a user sends `GET /api/reports/health`
- **AND** `DIFY_API_KEY` is not configured
- **THEN** the system SHALL return:
  ```json
  {
    "status": "error",
    "message": "DIFY_API_KEY 未設定，請聯繫系統管理員"
  }
  ```

#### Scenario: DIFY service unreachable
- **WHEN** a user sends `GET /api/reports/health`
- **AND** `DIFY_API_KEY` is configured
- **BUT** the DIFY service cannot be reached
- **THEN** the system SHALL return:
  ```json
  {
    "status": "error",
    "message": "無法連接 AI 服務，請稍後再試"
  }
  ```

### Requirement: Report Generation Status Polling

The frontend SHALL implement polling mechanism to ensure report status updates are received even if WebSocket connection is unstable.

#### Scenario: Poll report status after generation trigger
- **WHEN** a user triggers report generation
- **AND** receives the initial `report_id`
- **THEN** the frontend SHALL poll `GET /api/rooms/{room_id}/reports/{report_id}` every 2 seconds
- **AND** continue polling until status is "completed" or "failed"
- **AND** timeout after 120 seconds with user-friendly error message

#### Scenario: Display generation progress
- **WHEN** polling returns status "collecting_data"
- **THEN** the UI SHALL display "正在收集聊天室資料..."
- **WHEN** polling returns status "generating_content"
- **THEN** the UI SHALL display "AI 正在分析並生成報告內容..."
- **WHEN** polling returns status "assembling_document"
- **THEN** the UI SHALL display "正在組裝報告文件..."

#### Scenario: Display generation error
- **WHEN** polling returns status "failed"
- **THEN** the UI SHALL display the `error_message` from the response
- **AND** provide option to retry generation

### Requirement: Markdown Report Output
The report generation system SHALL provide reports in Markdown format for in-page preview.

#### Scenario: Get report as Markdown
- **WHEN** user requests `GET /api/rooms/{room_id}/reports/{report_id}/markdown`
- **AND** the report status is `completed`
- **THEN** the system returns the report content in Markdown format
- **AND** the Markdown includes all report sections (summary, timeline, participants, etc.)

#### Scenario: Markdown includes metadata
- **WHEN** generating Markdown output
- **THEN** the output includes a metadata header with room info, LOT numbers, dates
- **AND** the format is suitable for copy-paste to other platforms

### Requirement: In-Page Report Preview
The frontend SHALL display a preview of the generated report within the chat room interface.

#### Scenario: Display report preview
- **WHEN** user clicks on a completed report
- **THEN** a modal or drawer opens showing the Markdown-rendered report
- **AND** the preview includes proper formatting (headers, tables, lists)

#### Scenario: Copy Markdown content
- **WHEN** user clicks "Copy Markdown" in the preview
- **THEN** the raw Markdown text is copied to clipboard
- **AND** a success toast notification is shown

#### Scenario: Download Word from preview
- **WHEN** user clicks "Download Word" in the preview
- **THEN** the .docx file is downloaded
- **AND** the filename uses the report title