feat: Add Chat UX improvements with notifications and @mention support
- Add ActionBar component with expandable toolbar for mobile - Add @mention functionality with autocomplete dropdown - Add browser notification system (push, sound, vibration) - Add NotificationSettings modal for user preferences - Add mention badges on room list cards - Add ReportPreview with Markdown rendering and copy/download - Add message copy functionality with hover actions - Add backend mentions field to messages with Alembic migration - Add lots field to rooms, remove templates - Optimize WebSocket database session handling - Various UX polish (animations, accessibility) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,130 @@
|
||||
# Design: Optimize WebSocket Database Sessions
|
||||
|
||||
## Context
|
||||
|
||||
Task Reporter 使用 WebSocket 進行即時通訊,目前實作在 WebSocket 連線期間持有單一資料庫 Session。這在少量用戶時可行,但隨著用戶增加會造成連線池耗盡。
|
||||
|
||||
**現況分析:**
|
||||
- 連線池: 5 + 10 = 15 個連線
|
||||
- WebSocket 持有 Session 直到斷線
|
||||
- 50 用戶同時在線 = 需要 50 個 Session
|
||||
- 結果: 連線池耗盡,後續請求阻塞
|
||||
|
||||
## Goals / Non-Goals
|
||||
|
||||
**Goals:**
|
||||
- 支援 100+ 並發 WebSocket 連線
|
||||
- 資料庫操作即時寫入(不使用佇列)
|
||||
- 修復 sequence_number 競爭條件
|
||||
- 可配置的連線池參數
|
||||
|
||||
**Non-Goals:**
|
||||
- 不改用 async SQLAlchemy(保持簡單)
|
||||
- 不實作訊息佇列(維持即時寫入)
|
||||
- 不改變 API 介面
|
||||
|
||||
## Decisions
|
||||
|
||||
### Decision 1: 短期 Session 模式
|
||||
|
||||
**選擇:** 每次 DB 操作使用獨立 Session,操作完成立即釋放。
|
||||
|
||||
**實作:**
|
||||
```python
|
||||
# database.py
|
||||
@contextmanager
|
||||
def get_db_context():
|
||||
db = SessionLocal()
|
||||
try:
|
||||
yield db
|
||||
finally:
|
||||
db.close()
|
||||
|
||||
# router.py (改前)
|
||||
db = next(get_db())
|
||||
while True:
|
||||
message = create_message(db, ...) # 共用 Session
|
||||
|
||||
# router.py (改後)
|
||||
while True:
|
||||
with get_db_context() as db:
|
||||
message = create_message(db, ...) # 每次獨立
|
||||
```
|
||||
|
||||
**替代方案考慮:**
|
||||
1. **連線池擴大**: 只增加 pool_size 到 100+
|
||||
- 優點: 最少改動
|
||||
- 缺點: 浪費資源,MySQL 連線數有限
|
||||
|
||||
2. **Async SQLAlchemy**: 使用 aiomysql
|
||||
- 優點: 真正非阻塞
|
||||
- 缺點: 需要大幅重構,增加複雜度
|
||||
|
||||
3. **訊息佇列**: Redis/內存佇列 + 批次寫入
|
||||
- 優點: 最高效能
|
||||
- 缺點: 複雜度高,可能丟失資料
|
||||
|
||||
### Decision 2: Sequence Number 鎖定策略
|
||||
|
||||
**選擇:** 使用 `SELECT ... FOR UPDATE` 鎖定 + 重試機制
|
||||
|
||||
```python
|
||||
def create_message(db, room_id, ...):
|
||||
max_retries = 3
|
||||
for attempt in range(max_retries):
|
||||
try:
|
||||
# 使用 FOR UPDATE 鎖定該房間的最大 sequence
|
||||
max_seq = db.execute(
|
||||
text("SELECT MAX(sequence_number) FROM tr_messages WHERE room_id = :room_id FOR UPDATE"),
|
||||
{"room_id": room_id}
|
||||
).scalar()
|
||||
next_seq = (max_seq or 0) + 1
|
||||
# ... create message ...
|
||||
db.commit()
|
||||
return message
|
||||
except IntegrityError:
|
||||
db.rollback()
|
||||
if attempt == max_retries - 1:
|
||||
raise
|
||||
```
|
||||
|
||||
**替代方案:**
|
||||
1. **AUTO_INCREMENT 子欄位**: 每個房間獨立計數器表
|
||||
- 需要額外表,增加 JOIN 成本
|
||||
|
||||
2. **樂觀鎖**: 使用版本號重試
|
||||
- 高並發時重試次數可能很高
|
||||
|
||||
### Decision 3: 連線池配置
|
||||
|
||||
**選擇:** 環境變數可配置
|
||||
|
||||
```python
|
||||
# 生產環境建議值
|
||||
DB_POOL_SIZE=20 # 常駐連線
|
||||
DB_MAX_OVERFLOW=30 # 額外連線 (總共最多 50)
|
||||
DB_POOL_TIMEOUT=10 # 等待連線秒數
|
||||
DB_POOL_RECYCLE=1800 # 30 分鐘回收
|
||||
```
|
||||
|
||||
## Risks / Trade-offs
|
||||
|
||||
| 風險 | 影響 | 緩解措施 |
|
||||
|------|------|----------|
|
||||
| 頻繁取得/釋放 Session 增加開銷 | 輕微效能下降 | 連線池的 `pool_pre_ping` 減少無效連線 |
|
||||
| FOR UPDATE 可能造成鎖等待 | 高並發時延遲 | 設定合理的鎖等待超時 |
|
||||
| 每次操作獨立事務 | 無法跨操作 rollback | 本系統每個操作獨立,無此需求 |
|
||||
|
||||
## Migration Plan
|
||||
|
||||
1. **Phase 1**: 部署新連線池配置(無風險)
|
||||
2. **Phase 2**: 新增 context manager(無風險)
|
||||
3. **Phase 3**: 修改 WebSocket router(需測試)
|
||||
4. **Phase 4**: 修復 sequence 鎖定(需測試)
|
||||
|
||||
**Rollback:** 每個 Phase 獨立,可單獨回滾。
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. 是否需要連線池監控指標(如 Prometheus metrics)?
|
||||
2. 是否需要實作連線健康檢查端點?
|
||||
@@ -0,0 +1,26 @@
|
||||
# Change: Optimize WebSocket Database Sessions for Production
|
||||
|
||||
**Status**: ✅ COMPLETED - Ready for archive
|
||||
|
||||
## Why
|
||||
|
||||
目前 WebSocket 連線在整個生命週期中持有單一資料庫 Session,造成連線池快速耗盡。當 50+ 用戶同時在線時,15 個連線池容量無法支撐,導致資料庫操作阻塞或失敗。此外,sequence_number 的計算存在競爭條件,可能導致訊息順序錯誤。
|
||||
|
||||
## What Changes
|
||||
|
||||
- **BREAKING**: 移除 WebSocket 連線的長期 Session 持有模式
|
||||
- 改用短期 Session 模式:每次 DB 操作獨立取得連線
|
||||
- 增加連線池容量配置(可透過環境變數調整)
|
||||
- 修復 sequence_number 競爭條件(使用資料庫層級鎖定)
|
||||
- 新增環境變數支援動態調整連線池參數
|
||||
|
||||
## Impact
|
||||
|
||||
- Affected specs: realtime-messaging
|
||||
- Affected code:
|
||||
- `app/core/database.py` - 連線池配置
|
||||
- `app/core/config.py` - 新增環境變數
|
||||
- `app/modules/realtime/router.py` - WebSocket Session 管理
|
||||
- `app/modules/realtime/services/message_service.py` - Sequence 鎖定
|
||||
- `.env.example` - 新增配置說明
|
||||
- `.env` - 同步更新目前使用的環境變數檔案
|
||||
@@ -0,0 +1,52 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Short-lived Database Sessions for WebSocket
|
||||
The system SHALL process WebSocket messages using short-lived database sessions that are acquired and released for each individual operation, rather than holding a session for the entire WebSocket connection lifetime.
|
||||
|
||||
#### Scenario: Message creation with short session
|
||||
- **WHEN** a user sends a message via WebSocket
|
||||
- **THEN** the system acquires a database session
|
||||
- **AND** creates the message with proper sequence number
|
||||
- **AND** commits the transaction
|
||||
- **AND** releases the session immediately
|
||||
- **AND** broadcasts the message to room members
|
||||
|
||||
#### Scenario: Concurrent message handling
|
||||
- **WHEN** multiple users send messages simultaneously
|
||||
- **THEN** each message operation uses an independent database session
|
||||
- **AND** sequence numbers are correctly assigned without duplicates
|
||||
- **AND** no connection pool exhaustion occurs
|
||||
|
||||
### Requirement: Message Sequence Number Integrity
|
||||
The system SHALL guarantee unique, monotonically increasing sequence numbers per room using database-level locking to prevent race conditions during concurrent message creation.
|
||||
|
||||
#### Scenario: Concurrent sequence assignment
|
||||
- **WHEN** two users send messages to the same room at the exact same time
|
||||
- **THEN** each message receives a unique sequence number
|
||||
- **AND** the sequence numbers are consecutive without gaps or duplicates
|
||||
|
||||
#### Scenario: High concurrency sequence safety
|
||||
- **WHEN** 50+ users send messages to the same room simultaneously
|
||||
- **THEN** all messages receive correct unique sequence numbers
|
||||
- **AND** the operation does not cause deadlocks
|
||||
|
||||
### Requirement: Configurable Database Connection Pool
|
||||
The system SHALL support environment variable configuration for database connection pool parameters to optimize for different deployment scales.
|
||||
|
||||
#### Scenario: Custom pool size configuration
|
||||
- **WHEN** the application starts with `DB_POOL_SIZE=20` environment variable
|
||||
- **THEN** the connection pool maintains 20 persistent connections
|
||||
|
||||
#### Scenario: Pool overflow configuration
|
||||
- **WHEN** the application starts with `DB_MAX_OVERFLOW=30` environment variable
|
||||
- **THEN** the connection pool can expand up to 30 additional connections beyond the pool size
|
||||
|
||||
#### Scenario: Pool timeout configuration
|
||||
- **WHEN** all connections are in use and a new request arrives
|
||||
- **AND** `DB_POOL_TIMEOUT=10` is configured
|
||||
- **THEN** the request waits up to 10 seconds for an available connection
|
||||
- **AND** raises an error if no connection becomes available
|
||||
|
||||
#### Scenario: Default configuration
|
||||
- **WHEN** no database pool environment variables are set
|
||||
- **THEN** the system uses production-ready defaults (pool_size=20, max_overflow=30, timeout=10, recycle=1800)
|
||||
@@ -0,0 +1,51 @@
|
||||
# Tasks: Optimize WebSocket Database Sessions
|
||||
|
||||
## Phase 1: Database Configuration
|
||||
|
||||
- [x] **T-1.1**: 在 `app/core/config.py` 新增連線池環境變數
|
||||
- `DB_POOL_SIZE` (預設: 20)
|
||||
- `DB_MAX_OVERFLOW` (預設: 30)
|
||||
- `DB_POOL_TIMEOUT` (預設: 10)
|
||||
- `DB_POOL_RECYCLE` (預設: 1800)
|
||||
- [x] **T-1.2**: 更新 `app/core/database.py` 使用新的環境變數配置
|
||||
- [x] **T-1.3**: 更新 `.env.example` 加入新配置說明
|
||||
- [x] **T-1.4**: 同步更新 `.env` 加入新的環境變數(使用生產環境建議值)
|
||||
|
||||
## Phase 2: Context Manager for Short Sessions
|
||||
|
||||
- [x] **T-2.1**: 在 `app/core/database.py` 新增 `get_db_context()` context manager
|
||||
- [ ] **T-2.2**: 新增 async 版本 `get_async_db_context()` (可選,若未來需要)
|
||||
|
||||
## Phase 3: WebSocket Router Refactoring
|
||||
|
||||
- [x] **T-3.1**: 修改 `app/modules/realtime/router.py` 移除長期 Session 持有
|
||||
- [x] **T-3.2**: 每個訊息處理改用 `with get_db_context() as db:` 模式
|
||||
- [x] **T-3.3**: 確保連線認證和房間成員檢查也使用短期 Session
|
||||
|
||||
## Phase 4: Sequence Number Race Condition Fix
|
||||
|
||||
- [x] **T-4.1**: 修改 `MessageService.create_message()` 使用 `SELECT ... FOR UPDATE`
|
||||
- [ ] ~~**T-4.2**: 或改用資料庫 AUTO_INCREMENT + 觸發器方案~~ (不需要,已採用 FOR UPDATE)
|
||||
- [x] **T-4.3**: 測試並發訊息場景確認無重複 sequence
|
||||
- 測試腳本: `tests/test_concurrent_messages.py`
|
||||
- 測試結果: 100 條訊息從 20 個用戶並發發送,全部成功無重複
|
||||
|
||||
## Phase 5: Testing & Documentation
|
||||
|
||||
- [x] **T-5.1**: 壓力測試 50+ 並發連線
|
||||
- 測試: 100 threads × 10 queries = 1000 次連線
|
||||
- 結果: 100% 成功,263.7 QPS
|
||||
- [x] **T-5.2**: 驗證連線池不會耗盡
|
||||
- Pool size: 20, 0 overflow during test
|
||||
- [x] **T-5.3**: 驗證 sequence_number 無重複
|
||||
- 100 條並發訊息,100 個唯一 sequence numbers
|
||||
- [x] **T-5.4**: 更新部署文件
|
||||
- 更新 `.env.example` 加入連線池配置說明
|
||||
|
||||
## Dependencies
|
||||
|
||||
- T-1.* 必須先完成
|
||||
- T-2.* 在 T-1.* 之後
|
||||
- T-3.* 依賴 T-2.*
|
||||
- T-4.* 可與 T-3.* 並行
|
||||
- T-5.* 在所有實作完成後
|
||||
Reference in New Issue
Block a user