Files
egg 8b6184ecc5 feat: Meeting Assistant MVP - Complete implementation
Enterprise Meeting Knowledge Management System with:

Backend (FastAPI):
- Authentication proxy with JWT (pj-auth-api integration)
- MySQL database with 4 tables (users, meetings, conclusions, actions)
- Meeting CRUD with system code generation (C-YYYYMMDD-XX, A-YYYYMMDD-XX)
- Dify LLM integration for AI summarization
- Excel export with openpyxl
- 20 unit tests (all passing)

Client (Electron):
- Login page with company auth
- Meeting list with create/delete
- Meeting detail with real-time transcription
- Editable transcript textarea (single block, easy editing)
- AI summarization with conclusions/action items
- 5-second segment recording (efficient for long meetings)

Sidecar (Python):
- faster-whisper medium model with int8 quantization
- ONNX Runtime VAD (lightweight, ~20MB vs PyTorch ~2GB)
- Chinese punctuation processing
- OpenCC for Traditional Chinese conversion
- Anti-hallucination parameters
- Auto-cleanup of temp audio files

OpenSpec:
- add-meeting-assistant-mvp (47 tasks, archived)
- add-realtime-transcription (29 tasks, archived)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 20:17:44 +08:00

2.7 KiB
Raw Permalink Blame History

1. Sidecar Streaming Infrastructure

  • 1.1 Add silero-vad dependency to requirements.txt
  • 1.2 Implement VADProcessor class with speech boundary detection
  • 1.3 Add streaming mode to Transcriber (action: "start_stream", "audio_chunk", "stop_stream")
  • 1.4 Implement audio buffer with VAD-triggered transcription
  • 1.5 Add segment_id tracking for each utterance
  • 1.6 Test VAD with sample Chinese speech audio

2. Punctuation Processing

  • 2.1 Enable word_timestamps in Whisper transcribe()
  • 2.2 Implement ChinesePunctuator class with rule-based punctuation
  • 2.3 Add pause-based sentence boundary detection (>500ms → period)
  • 2.4 Add question detection (嗎、呢、什麼 patterns → )
  • 2.5 Test punctuation output quality with sample transcripts

3. IPC Audio Streaming

  • 3.1 Add "start-recording-stream" IPC handler in main.js
  • 3.2 Add "stream-audio-chunk" IPC handler to forward audio to sidecar
  • 3.3 Add "stop-recording-stream" IPC handler
  • 3.4 Implement WebM to PCM conversion using web-audio-api or ffmpeg.wasm
  • 3.5 Forward sidecar segment events to renderer via "transcription-segment" IPC
  • 3.6 Update preload.js with streaming API exposure

4. Frontend Editable Transcript

  • 4.1 Create TranscriptSegment component (editable text block with segment_id)
  • 4.2 Implement segment container with append-only behavior during recording
  • 4.3 Add edit handler that updates local segment data
  • 4.4 Style active segment (currently receiving text) differently
  • 4.5 Update Save button to merge all segments into transcript_blob
  • 4.6 Add visual indicator for streaming status

5. Integration & Testing

  • 5.1 End-to-end test: start recording → speak → see text appear
  • 5.2 Test editing segment while new segments arrive
  • 5.3 Test save with mixed edited/unedited segments
  • 5.4 Performance test on i5/8GB target hardware
  • 5.5 Test with 30+ minute continuous recording
  • 5.6 Update meeting-detail.html recording flow documentation

Dependencies

  • Task 3 depends on Task 1 (sidecar must support streaming first)
  • Task 4 depends on Task 3 (frontend needs IPC to receive segments)
  • Task 2 can run in parallel with Task 3

Parallelizable Work

  • Tasks 1 and 4 can start simultaneously (sidecar and frontend scaffolding)
  • Task 2 can run in parallel with Task 3

Implementation Notes

  • VAD uses Silero VAD with fallback to 5-second time-based segmentation if torch unavailable
  • Audio captured at 16kHz mono, converted to int16 PCM, sent as base64
  • ChinesePunctuator uses regex patterns for question detection
  • Segments are editable immediately, edited segments marked with orange border