## 1. Sidecar Streaming Infrastructure - [x] 1.1 Add silero-vad dependency to requirements.txt - [x] 1.2 Implement VADProcessor class with speech boundary detection - [x] 1.3 Add streaming mode to Transcriber (action: "start_stream", "audio_chunk", "stop_stream") - [x] 1.4 Implement audio buffer with VAD-triggered transcription - [x] 1.5 Add segment_id tracking for each utterance - [x] 1.6 Test VAD with sample Chinese speech audio ## 2. Punctuation Processing - [x] 2.1 Enable word_timestamps in Whisper transcribe() - [x] 2.2 Implement ChinesePunctuator class with rule-based punctuation - [x] 2.3 Add pause-based sentence boundary detection (>500ms → period) - [x] 2.4 Add question detection (嗎、呢、什麼 patterns → ?) - [x] 2.5 Test punctuation output quality with sample transcripts ## 3. IPC Audio Streaming - [x] 3.1 Add "start-recording-stream" IPC handler in main.js - [x] 3.2 Add "stream-audio-chunk" IPC handler to forward audio to sidecar - [x] 3.3 Add "stop-recording-stream" IPC handler - [x] 3.4 Implement WebM to PCM conversion using web-audio-api or ffmpeg.wasm - [x] 3.5 Forward sidecar segment events to renderer via "transcription-segment" IPC - [x] 3.6 Update preload.js with streaming API exposure ## 4. Frontend Editable Transcript - [x] 4.1 Create TranscriptSegment component (editable text block with segment_id) - [x] 4.2 Implement segment container with append-only behavior during recording - [x] 4.3 Add edit handler that updates local segment data - [x] 4.4 Style active segment (currently receiving text) differently - [x] 4.5 Update Save button to merge all segments into transcript_blob - [x] 4.6 Add visual indicator for streaming status ## 5. Integration & Testing - [x] 5.1 End-to-end test: start recording → speak → see text appear - [x] 5.2 Test editing segment while new segments arrive - [x] 5.3 Test save with mixed edited/unedited segments - [x] 5.4 Performance test on i5/8GB target hardware - [x] 5.5 Test with 30+ minute continuous recording - [x] 5.6 Update meeting-detail.html recording flow documentation ## Dependencies - Task 3 depends on Task 1 (sidecar must support streaming first) - Task 4 depends on Task 3 (frontend needs IPC to receive segments) - Task 2 can run in parallel with Task 3 ## Parallelizable Work - Tasks 1 and 4 can start simultaneously (sidecar and frontend scaffolding) - Task 2 can run in parallel with Task 3 ## Implementation Notes - VAD uses Silero VAD with fallback to 5-second time-based segmentation if torch unavailable - Audio captured at 16kHz mono, converted to int16 PCM, sent as base64 - ChinesePunctuator uses regex patterns for question detection - Segments are editable immediately, edited segments marked with orange border