Files
Meeting_Assistant/openspec/changes/archive/2025-12-10-add-realtime-transcription/tasks.md
egg 8b6184ecc5 feat: Meeting Assistant MVP - Complete implementation
Enterprise Meeting Knowledge Management System with:

Backend (FastAPI):
- Authentication proxy with JWT (pj-auth-api integration)
- MySQL database with 4 tables (users, meetings, conclusions, actions)
- Meeting CRUD with system code generation (C-YYYYMMDD-XX, A-YYYYMMDD-XX)
- Dify LLM integration for AI summarization
- Excel export with openpyxl
- 20 unit tests (all passing)

Client (Electron):
- Login page with company auth
- Meeting list with create/delete
- Meeting detail with real-time transcription
- Editable transcript textarea (single block, easy editing)
- AI summarization with conclusions/action items
- 5-second segment recording (efficient for long meetings)

Sidecar (Python):
- faster-whisper medium model with int8 quantization
- ONNX Runtime VAD (lightweight, ~20MB vs PyTorch ~2GB)
- Chinese punctuation processing
- OpenCC for Traditional Chinese conversion
- Anti-hallucination parameters
- Auto-cleanup of temp audio files

OpenSpec:
- add-meeting-assistant-mvp (47 tasks, archived)
- add-realtime-transcription (29 tasks, archived)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 20:17:44 +08:00

54 lines
2.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

## 1. Sidecar Streaming Infrastructure
- [x] 1.1 Add silero-vad dependency to requirements.txt
- [x] 1.2 Implement VADProcessor class with speech boundary detection
- [x] 1.3 Add streaming mode to Transcriber (action: "start_stream", "audio_chunk", "stop_stream")
- [x] 1.4 Implement audio buffer with VAD-triggered transcription
- [x] 1.5 Add segment_id tracking for each utterance
- [x] 1.6 Test VAD with sample Chinese speech audio
## 2. Punctuation Processing
- [x] 2.1 Enable word_timestamps in Whisper transcribe()
- [x] 2.2 Implement ChinesePunctuator class with rule-based punctuation
- [x] 2.3 Add pause-based sentence boundary detection (>500ms → period)
- [x] 2.4 Add question detection (嗎、呢、什麼 patterns → )
- [x] 2.5 Test punctuation output quality with sample transcripts
## 3. IPC Audio Streaming
- [x] 3.1 Add "start-recording-stream" IPC handler in main.js
- [x] 3.2 Add "stream-audio-chunk" IPC handler to forward audio to sidecar
- [x] 3.3 Add "stop-recording-stream" IPC handler
- [x] 3.4 Implement WebM to PCM conversion using web-audio-api or ffmpeg.wasm
- [x] 3.5 Forward sidecar segment events to renderer via "transcription-segment" IPC
- [x] 3.6 Update preload.js with streaming API exposure
## 4. Frontend Editable Transcript
- [x] 4.1 Create TranscriptSegment component (editable text block with segment_id)
- [x] 4.2 Implement segment container with append-only behavior during recording
- [x] 4.3 Add edit handler that updates local segment data
- [x] 4.4 Style active segment (currently receiving text) differently
- [x] 4.5 Update Save button to merge all segments into transcript_blob
- [x] 4.6 Add visual indicator for streaming status
## 5. Integration & Testing
- [x] 5.1 End-to-end test: start recording → speak → see text appear
- [x] 5.2 Test editing segment while new segments arrive
- [x] 5.3 Test save with mixed edited/unedited segments
- [x] 5.4 Performance test on i5/8GB target hardware
- [x] 5.5 Test with 30+ minute continuous recording
- [x] 5.6 Update meeting-detail.html recording flow documentation
## Dependencies
- Task 3 depends on Task 1 (sidecar must support streaming first)
- Task 4 depends on Task 3 (frontend needs IPC to receive segments)
- Task 2 can run in parallel with Task 3
## Parallelizable Work
- Tasks 1 and 4 can start simultaneously (sidecar and frontend scaffolding)
- Task 2 can run in parallel with Task 3
## Implementation Notes
- VAD uses Silero VAD with fallback to 5-second time-based segmentation if torch unavailable
- Audio captured at 16kHz mono, converted to int16 PCM, sent as base64
- ChinesePunctuator uses regex patterns for question detection
- Segments are editable immediately, edited segments marked with orange border