# transcription Specification ## Purpose TBD - created by archiving change add-meeting-assistant-mvp. Update Purpose after archive. ## Requirements ### Requirement: Edge Speech-to-Text The Electron client SHALL perform speech-to-text conversion locally using faster-whisper int8 model. #### Scenario: Successful transcription - **WHEN** user records audio during a meeting - **THEN** the audio SHALL be transcribed locally without network dependency #### Scenario: Transcription on target hardware - **WHEN** running on i5 processor with 8GB RAM - **THEN** transcription SHALL complete within acceptable latency for real-time display ### Requirement: Traditional Chinese Output The transcription engine SHALL output Traditional Chinese (繁體中文) text. #### Scenario: Simplified to Traditional conversion - **WHEN** whisper outputs Simplified Chinese characters - **THEN** OpenCC SHALL convert output to Traditional Chinese #### Scenario: Native Traditional Chinese - **WHEN** whisper outputs Traditional Chinese directly - **THEN** the text SHALL pass through unchanged ### Requirement: Real-time Display The Electron client SHALL display transcription results in real-time. #### Scenario: Streaming transcription - **WHEN** user is recording - **THEN** transcribed text SHALL appear in the left panel within seconds of speech ### Requirement: Python Sidecar The transcription engine SHALL be packaged as a Python sidecar using PyInstaller. #### Scenario: Sidecar startup - **WHEN** Electron app launches - **THEN** the Python sidecar containing faster-whisper and OpenCC SHALL be available #### Scenario: Sidecar communication - **WHEN** Electron sends audio data to sidecar - **THEN** transcribed text SHALL be returned via IPC ### Requirement: Streaming Transcription Mode The sidecar SHALL support a streaming mode where audio chunks are continuously received and transcribed in real-time with VAD-triggered segmentation. #### Scenario: Start streaming session - **WHEN** sidecar receives `{"action": "start_stream"}` command - **THEN** it SHALL initialize audio buffer and VAD processor - **AND** respond with `{"status": "streaming", "session_id": ""}` #### Scenario: Process audio chunk - **WHEN** sidecar receives `{"action": "audio_chunk", "data": ""}` during active stream - **THEN** it SHALL append audio to buffer and run VAD detection - **AND** if speech boundary detected, transcribe accumulated audio - **AND** emit `{"segment_id": , "text": "", "is_final": true}` #### Scenario: Stop streaming session - **WHEN** sidecar receives `{"action": "stop_stream"}` command - **THEN** it SHALL transcribe any remaining buffered audio - **AND** respond with `{"status": "stream_stopped", "total_segments": }` ### Requirement: VAD-based Speech Segmentation The sidecar SHALL use Voice Activity Detection to identify natural speech boundaries for segmentation. #### Scenario: Detect speech end - **WHEN** VAD detects silence exceeding 500ms after speech - **THEN** the accumulated speech audio SHALL be sent for transcription - **AND** a new segment SHALL begin for subsequent speech #### Scenario: Handle continuous speech - **WHEN** speech continues for more than 15 seconds without pause - **THEN** the sidecar SHALL force a segment boundary - **AND** transcribe the 15-second chunk to prevent excessive latency ### Requirement: Punctuation in Transcription Output The sidecar SHALL output transcribed text with appropriate Chinese punctuation marks. #### Scenario: Add sentence-ending punctuation - **WHEN** transcription completes for a segment - **THEN** the output SHALL include period (。) at natural sentence boundaries - **AND** question marks (?) for interrogative sentences - **AND** commas (,) for clause breaks within sentences #### Scenario: Detect question patterns - **WHEN** transcribed text ends with question particles (嗎、呢、什麼、怎麼、為什麼) - **THEN** the punctuation processor SHALL append question mark (?)