## ADDED Requirements

### Requirement: Audio File Upload
The Electron client SHALL allow users to upload pre-recorded audio files for transcription.

#### Scenario: Upload audio file
- **WHEN** user clicks "Upload Audio" button in meeting detail page
- **THEN** file picker SHALL open with filter for supported audio formats (MP3, WAV, M4A, WebM, OGG)

#### Scenario: Show upload progress
- **WHEN** audio file is being uploaded
- **THEN** progress indicator SHALL be displayed showing upload percentage

#### Scenario: Show transcription progress
- **WHEN** audio file is being transcribed in chunks
- **THEN** progress indicator SHALL display "Processing chunk X of Y"

#### Scenario: Replace existing transcript
- **WHEN** user uploads audio file and transcript already has content
- **THEN** confirmation dialog SHALL appear before replacing existing transcript

#### Scenario: File size limit
- **WHEN** user selects audio file larger than 500MB
- **THEN** error message SHALL be displayed indicating file size limit

### Requirement: VAD-Based Audio Segmentation
The sidecar SHALL segment large audio files using Voice Activity Detection before cloud transcription.

#### Scenario: Segment audio command
- **WHEN** sidecar receives `{"action": "segment_audio", "file_path": "...", "max_chunk_seconds": 300}`
- **THEN** it SHALL load audio file and run VAD to detect speech boundaries

#### Scenario: Split at silence boundaries
- **WHEN** VAD detects silence gap >= 500ms within max chunk duration
- **THEN** audio SHALL be split at the silence boundary
- **AND** each chunk exported as WAV file to temp directory

#### Scenario: Force split for continuous speech
- **WHEN** speech continues beyond max_chunk_seconds without silence gap
- **THEN** audio SHALL be force-split at max_chunk_seconds boundary

#### Scenario: Return segment metadata
- **WHEN** segmentation completes
- **THEN** sidecar SHALL return list of segments with file paths and timestamps

### Requirement: Dify Speech-to-Text Integration
The backend SHALL integrate with Dify STT service for audio file transcription.

#### Scenario: Transcribe uploaded audio with chunking
- **WHEN** backend receives POST /api/ai/transcribe-audio with audio file
- **THEN** backend SHALL call sidecar for VAD segmentation
- **AND** send each chunk to Dify STT API sequentially
- **AND** concatenate results into final transcript

#### Scenario: Supported audio formats
- **WHEN** audio file is in MP3, WAV, M4A, WebM, or OGG format
- **THEN** system SHALL accept and process the file

#### Scenario: Unsupported format handling
- **WHEN** audio file format is not supported
- **THEN** backend SHALL return HTTP 400 with error message listing supported formats

#### Scenario: Dify chunk transcription
- **WHEN** backend sends audio chunk to Dify STT API
- **THEN** chunk size SHALL be under 25MB to comply with API limits

#### Scenario: Transcription timeout per chunk
- **WHEN** Dify STT does not respond for a chunk within 2 minutes
- **THEN** backend SHALL retry up to 3 times with exponential backoff

#### Scenario: Dify STT error handling
- **WHEN** Dify STT API returns error after retries
- **THEN** backend SHALL return HTTP 502 with error details

### Requirement: Dual Transcription Mode
The system SHALL support both real-time local transcription and file-based cloud transcription.

#### Scenario: Real-time transcription unchanged
- **WHEN** user records audio in real-time
- **THEN** local sidecar SHALL process audio using faster-whisper (existing behavior)

#### Scenario: File upload uses cloud transcription
- **WHEN** user uploads audio file
- **THEN** Dify cloud service SHALL process audio via chunked upload

#### Scenario: Unified transcript output
- **WHEN** transcription completes from either source
- **THEN** result SHALL be displayed in the same transcript area in meeting detail page