feat: Meeting Assistant MVP - Complete implementation
Enterprise Meeting Knowledge Management System with: Backend (FastAPI): - Authentication proxy with JWT (pj-auth-api integration) - MySQL database with 4 tables (users, meetings, conclusions, actions) - Meeting CRUD with system code generation (C-YYYYMMDD-XX, A-YYYYMMDD-XX) - Dify LLM integration for AI summarization - Excel export with openpyxl - 20 unit tests (all passing) Client (Electron): - Login page with company auth - Meeting list with create/delete - Meeting detail with real-time transcription - Editable transcript textarea (single block, easy editing) - AI summarization with conclusions/action items - 5-second segment recording (efficient for long meetings) Sidecar (Python): - faster-whisper medium model with int8 quantization - ONNX Runtime VAD (lightweight, ~20MB vs PyTorch ~2GB) - Chinese punctuation processing - OpenCC for Traditional Chinese conversion - Anti-hallucination parameters - Auto-cleanup of temp audio files OpenSpec: - add-meeting-assistant-mvp (47 tasks, archived) - add-realtime-transcription (29 tasks, archived) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,46 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Streaming Transcription Mode
|
||||
The sidecar SHALL support a streaming mode where audio chunks are continuously received and transcribed in real-time with VAD-triggered segmentation.
|
||||
|
||||
#### Scenario: Start streaming session
|
||||
- **WHEN** sidecar receives `{"action": "start_stream"}` command
|
||||
- **THEN** it SHALL initialize audio buffer and VAD processor
|
||||
- **AND** respond with `{"status": "streaming", "session_id": "<uuid>"}`
|
||||
|
||||
#### Scenario: Process audio chunk
|
||||
- **WHEN** sidecar receives `{"action": "audio_chunk", "data": "<base64_pcm>"}` during active stream
|
||||
- **THEN** it SHALL append audio to buffer and run VAD detection
|
||||
- **AND** if speech boundary detected, transcribe accumulated audio
|
||||
- **AND** emit `{"segment_id": <int>, "text": "<transcription>", "is_final": true}`
|
||||
|
||||
#### Scenario: Stop streaming session
|
||||
- **WHEN** sidecar receives `{"action": "stop_stream"}` command
|
||||
- **THEN** it SHALL transcribe any remaining buffered audio
|
||||
- **AND** respond with `{"status": "stream_stopped", "total_segments": <int>}`
|
||||
|
||||
### Requirement: VAD-based Speech Segmentation
|
||||
The sidecar SHALL use Voice Activity Detection to identify natural speech boundaries for segmentation.
|
||||
|
||||
#### Scenario: Detect speech end
|
||||
- **WHEN** VAD detects silence exceeding 500ms after speech
|
||||
- **THEN** the accumulated speech audio SHALL be sent for transcription
|
||||
- **AND** a new segment SHALL begin for subsequent speech
|
||||
|
||||
#### Scenario: Handle continuous speech
|
||||
- **WHEN** speech continues for more than 15 seconds without pause
|
||||
- **THEN** the sidecar SHALL force a segment boundary
|
||||
- **AND** transcribe the 15-second chunk to prevent excessive latency
|
||||
|
||||
### Requirement: Punctuation in Transcription Output
|
||||
The sidecar SHALL output transcribed text with appropriate Chinese punctuation marks.
|
||||
|
||||
#### Scenario: Add sentence-ending punctuation
|
||||
- **WHEN** transcription completes for a segment
|
||||
- **THEN** the output SHALL include period (。) at natural sentence boundaries
|
||||
- **AND** question marks (?) for interrogative sentences
|
||||
- **AND** commas (,) for clause breaks within sentences
|
||||
|
||||
#### Scenario: Detect question patterns
|
||||
- **WHEN** transcribed text ends with question particles (嗎、呢、什麼、怎麼、為什麼)
|
||||
- **THEN** the punctuation processor SHALL append question mark (?)
|
||||
Reference in New Issue
Block a user