Move completed Dify audio transcription proposal to archive and update transcription spec with new capabilities. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1.6 KiB
1.6 KiB
Change: Add Dify Audio Transcription for Uploaded Files
Why
Users need to transcribe pre-recorded audio files (e.g., meeting recordings from external sources). Currently, transcription only works with real-time recording via the local sidecar. Adding Dify-based transcription for uploaded files provides flexibility while keeping real-time transcription local for low latency.
What Changes
- Add audio file upload UI in Electron client (meeting detail page)
- Add
segment_audiocommand to sidecar for VAD-based audio chunking - Add backend API endpoint to receive audio files, chunk via sidecar, and forward to Dify STT service
- Each chunk (~5 minutes max) sent to Dify separately, results concatenated
- Transcription result replaces the transcript field (same as real-time transcription)
- Support common audio formats: MP3, WAV, M4A, WebM, OGG
Impact
- Affected specs:
transcription - Affected code:
sidecar/transcriber.py- Addsegment_audioaction for VAD chunkingclient/src/pages/meeting-detail.html- Add upload button and progress UIbackend/app/routers/ai.py- Add/api/ai/transcribe-audioendpointbackend/app/config.py- Add Dify STT API key configuration
Technical Notes
- Dify STT API Key:
app-xQeSipaQecs0cuKeLvYDaRsu - Real-time transcription continues to use local sidecar (no change)
- File upload transcription uses Dify cloud service with VAD chunking
- VAD chunking ensures each chunk < 25MB (Dify API limit)
- Max file size: 500MB (chunked processing handles large files)
- Both methods output to the same transcript_blob field