## ADDED Requirements ### Requirement: Audio File Upload The Electron client SHALL allow users to upload pre-recorded audio files for transcription. #### Scenario: Upload audio file - **WHEN** user clicks "Upload Audio" button in meeting detail page - **THEN** file picker SHALL open with filter for supported audio formats (MP3, WAV, M4A, WebM, OGG) #### Scenario: Show upload progress - **WHEN** audio file is being uploaded - **THEN** progress indicator SHALL be displayed showing upload percentage #### Scenario: Show transcription progress - **WHEN** audio file is being transcribed in chunks - **THEN** progress indicator SHALL display "Processing chunk X of Y" #### Scenario: Replace existing transcript - **WHEN** user uploads audio file and transcript already has content - **THEN** confirmation dialog SHALL appear before replacing existing transcript #### Scenario: File size limit - **WHEN** user selects audio file larger than 500MB - **THEN** error message SHALL be displayed indicating file size limit ### Requirement: VAD-Based Audio Segmentation The sidecar SHALL segment large audio files using Voice Activity Detection before cloud transcription. #### Scenario: Segment audio command - **WHEN** sidecar receives `{"action": "segment_audio", "file_path": "...", "max_chunk_seconds": 300}` - **THEN** it SHALL load audio file and run VAD to detect speech boundaries #### Scenario: Split at silence boundaries - **WHEN** VAD detects silence gap >= 500ms within max chunk duration - **THEN** audio SHALL be split at the silence boundary - **AND** each chunk exported as WAV file to temp directory #### Scenario: Force split for continuous speech - **WHEN** speech continues beyond max_chunk_seconds without silence gap - **THEN** audio SHALL be force-split at max_chunk_seconds boundary #### Scenario: Return segment metadata - **WHEN** segmentation completes - **THEN** sidecar SHALL return list of segments with file paths and timestamps ### Requirement: Dify Speech-to-Text Integration The backend SHALL integrate with Dify STT service for audio file transcription. #### Scenario: Transcribe uploaded audio with chunking - **WHEN** backend receives POST /api/ai/transcribe-audio with audio file - **THEN** backend SHALL call sidecar for VAD segmentation - **AND** send each chunk to Dify STT API sequentially - **AND** concatenate results into final transcript #### Scenario: Supported audio formats - **WHEN** audio file is in MP3, WAV, M4A, WebM, or OGG format - **THEN** system SHALL accept and process the file #### Scenario: Unsupported format handling - **WHEN** audio file format is not supported - **THEN** backend SHALL return HTTP 400 with error message listing supported formats #### Scenario: Dify chunk transcription - **WHEN** backend sends audio chunk to Dify STT API - **THEN** chunk size SHALL be under 25MB to comply with API limits #### Scenario: Transcription timeout per chunk - **WHEN** Dify STT does not respond for a chunk within 2 minutes - **THEN** backend SHALL retry up to 3 times with exponential backoff #### Scenario: Dify STT error handling - **WHEN** Dify STT API returns error after retries - **THEN** backend SHALL return HTTP 502 with error details ### Requirement: Dual Transcription Mode The system SHALL support both real-time local transcription and file-based cloud transcription. #### Scenario: Real-time transcription unchanged - **WHEN** user records audio in real-time - **THEN** local sidecar SHALL process audio using faster-whisper (existing behavior) #### Scenario: File upload uses cloud transcription - **WHEN** user uploads audio file - **THEN** Dify cloud service SHALL process audio via chunked upload #### Scenario: Unified transcript output - **WHEN** transcription completes from either source - **THEN** result SHALL be displayed in the same transcript area in meeting detail page