Enterprise Meeting Knowledge Management System with: Backend (FastAPI): - Authentication proxy with JWT (pj-auth-api integration) - MySQL database with 4 tables (users, meetings, conclusions, actions) - Meeting CRUD with system code generation (C-YYYYMMDD-XX, A-YYYYMMDD-XX) - Dify LLM integration for AI summarization - Excel export with openpyxl - 20 unit tests (all passing) Client (Electron): - Login page with company auth - Meeting list with create/delete - Meeting detail with real-time transcription - Editable transcript textarea (single block, easy editing) - AI summarization with conclusions/action items - 5-second segment recording (efficient for long meetings) Sidecar (Python): - faster-whisper medium model with int8 quantization - ONNX Runtime VAD (lightweight, ~20MB vs PyTorch ~2GB) - Chinese punctuation processing - OpenCC for Traditional Chinese conversion - Anti-hallucination parameters - Auto-cleanup of temp audio files OpenSpec: - add-meeting-assistant-mvp (47 tasks, archived) - add-realtime-transcription (29 tasks, archived) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1.2 KiB
1.2 KiB
Change: Add Real-time Streaming Transcription
Why
Current transcription workflow requires users to stop recording before seeing results. Users cannot edit transcription errors, and output lacks punctuation. For meeting scenarios, real-time feedback with editable text is essential for immediate correction and context awareness.
What Changes
- Sidecar: Implement streaming VAD-based transcription with sentence segmentation
- IPC: Add continuous audio streaming from renderer to main process to sidecar
- Frontend: Make transcript editable with real-time segment updates
- Punctuation: Enable Whisper's word timestamps and add sentence boundary detection
Impact
- Affected specs:
transcription(new),frontend-transcript(new) - Affected code:
sidecar/transcriber.py- Add streaming mode with VADclient/src/main.js- Add audio streaming IPC handlersclient/src/preload.js- Expose streaming APIsclient/src/pages/meeting-detail.html- Editable transcript component
Success Criteria
- User sees text appearing within 2-3 seconds of speaking
- Each segment is individually editable
- Output includes punctuation (。,?!)
- Recording can continue while user edits previous segments