Meeting_Assistant/openspec/changes/archive/2025-12-10-add-realtime-transcription/tasks.md

## 1. Sidecar Streaming Infrastructure
- [x] 1.1 Add silero-vad dependency to requirements.txt
- [x] 1.2 Implement VADProcessor class with speech boundary detection
- [x] 1.3 Add streaming mode to Transcriber (action: "start_stream", "audio_chunk", "stop_stream")
- [x] 1.4 Implement audio buffer with VAD-triggered transcription
- [x] 1.5 Add segment_id tracking for each utterance
- [x] 1.6 Test VAD with sample Chinese speech audio

## 2. Punctuation Processing
- [x] 2.1 Enable word_timestamps in Whisper transcribe()
- [x] 2.2 Implement ChinesePunctuator class with rule-based punctuation
- [x] 2.3 Add pause-based sentence boundary detection (>500ms → period)
- [x] 2.4 Add question detection (嗎、呢、什麼 patterns → ？)
- [x] 2.5 Test punctuation output quality with sample transcripts

## 3. IPC Audio Streaming
- [x] 3.1 Add "start-recording-stream" IPC handler in main.js
- [x] 3.2 Add "stream-audio-chunk" IPC handler to forward audio to sidecar
- [x] 3.3 Add "stop-recording-stream" IPC handler
- [x] 3.4 Implement WebM to PCM conversion using web-audio-api or ffmpeg.wasm
- [x] 3.5 Forward sidecar segment events to renderer via "transcription-segment" IPC
- [x] 3.6 Update preload.js with streaming API exposure

## 4. Frontend Editable Transcript
- [x] 4.1 Create TranscriptSegment component (editable text block with segment_id)
- [x] 4.2 Implement segment container with append-only behavior during recording
- [x] 4.3 Add edit handler that updates local segment data
- [x] 4.4 Style active segment (currently receiving text) differently
- [x] 4.5 Update Save button to merge all segments into transcript_blob
- [x] 4.6 Add visual indicator for streaming status

## 5. Integration & Testing
- [x] 5.1 End-to-end test: start recording → speak → see text appear
- [x] 5.2 Test editing segment while new segments arrive
- [x] 5.3 Test save with mixed edited/unedited segments
- [x] 5.4 Performance test on i5/8GB target hardware
- [x] 5.5 Test with 30+ minute continuous recording
- [x] 5.6 Update meeting-detail.html recording flow documentation

## Dependencies
- Task 3 depends on Task 1 (sidecar must support streaming first)
- Task 4 depends on Task 3 (frontend needs IPC to receive segments)
- Task 2 can run in parallel with Task 3

## Parallelizable Work
- Tasks 1 and 4 can start simultaneously (sidecar and frontend scaffolding)
- Task 2 can run in parallel with Task 3

## Implementation Notes
- VAD uses Silero VAD with fallback to 5-second time-based segmentation if torch unavailable
- Audio captured at 16kHz mono, converted to int16 PCM, sent as base64
- ChinesePunctuator uses regex patterns for question detection
- Segments are editable immediately, edited segments marked with orange border