Meeting_Assistant/openspec/specs/transcription/spec.md

# transcription Specification

## Purpose
TBD - created by archiving change add-meeting-assistant-mvp. Update Purpose after archive.
## Requirements
### Requirement: Edge Speech-to-Text
The Electron client SHALL perform speech-to-text conversion locally using faster-whisper int8 model.

#### Scenario: Successful transcription
- **WHEN** user records audio during a meeting
- **THEN** the audio SHALL be transcribed locally without network dependency

#### Scenario: Transcription on target hardware
- **WHEN** running on i5 processor with 8GB RAM
- **THEN** transcription SHALL complete within acceptable latency for real-time display

### Requirement: Traditional Chinese Output
The transcription engine SHALL output Traditional Chinese (繁體中文) text.

#### Scenario: Simplified to Traditional conversion
- **WHEN** whisper outputs Simplified Chinese characters
- **THEN** OpenCC SHALL convert output to Traditional Chinese

#### Scenario: Native Traditional Chinese
- **WHEN** whisper outputs Traditional Chinese directly
- **THEN** the text SHALL pass through unchanged

### Requirement: Real-time Display
The Electron client SHALL display transcription results in real-time.

#### Scenario: Streaming transcription
- **WHEN** user is recording
- **THEN** transcribed text SHALL appear in the left panel within seconds of speech

### Requirement: Python Sidecar
The transcription engine SHALL be packaged as a Python sidecar using PyInstaller.

#### Scenario: Sidecar startup
- **WHEN** Electron app launches
- **THEN** the Python sidecar containing faster-whisper and OpenCC SHALL be available

#### Scenario: Sidecar communication
- **WHEN** Electron sends audio data to sidecar
- **THEN** transcribed text SHALL be returned via IPC

### Requirement: Streaming Transcription Mode
The sidecar SHALL support a streaming mode where audio chunks are continuously received and transcribed in real-time with VAD-triggered segmentation.

#### Scenario: Start streaming session
- **WHEN** sidecar receives `{"action": "start_stream"}` command
- **THEN** it SHALL initialize audio buffer and VAD processor
- **AND** respond with `{"status": "streaming", "session_id": "<uuid>"}`

#### Scenario: Process audio chunk
- **WHEN** sidecar receives `{"action": "audio_chunk", "data": "<base64_pcm>"}` during active stream
- **THEN** it SHALL append audio to buffer and run VAD detection
- **AND** if speech boundary detected, transcribe accumulated audio
- **AND** emit `{"segment_id": <int>, "text": "<transcription>", "is_final": true}`

#### Scenario: Stop streaming session
- **WHEN** sidecar receives `{"action": "stop_stream"}` command
- **THEN** it SHALL transcribe any remaining buffered audio
- **AND** respond with `{"status": "stream_stopped", "total_segments": <int>}`

### Requirement: VAD-based Speech Segmentation
The sidecar SHALL use Voice Activity Detection to identify natural speech boundaries for segmentation.

#### Scenario: Detect speech end
- **WHEN** VAD detects silence exceeding 500ms after speech
- **THEN** the accumulated speech audio SHALL be sent for transcription
- **AND** a new segment SHALL begin for subsequent speech

#### Scenario: Handle continuous speech
- **WHEN** speech continues for more than 15 seconds without pause
- **THEN** the sidecar SHALL force a segment boundary
- **AND** transcribe the 15-second chunk to prevent excessive latency

### Requirement: Punctuation in Transcription Output
The sidecar SHALL output transcribed text with appropriate Chinese punctuation marks.

#### Scenario: Add sentence-ending punctuation
- **WHEN** transcription completes for a segment
- **THEN** the output SHALL include period (。) at natural sentence boundaries
- **AND** question marks (？) for interrogative sentences
- **AND** commas (，) for clause breaks within sentences

#### Scenario: Detect question patterns
- **WHEN** transcribed text ends with question particles (嗎、呢、什麼、怎麼、為什麼)
- **THEN** the punctuation processor SHALL append question mark (？)