chore: Archive add-dify-audio-transcription proposal

Move completed Dify audio transcription proposal to archive and update transcription spec with new capabilities. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 21:05:01 +08:00
parent 263eb1c394
commit 2e78e3760a
5 changed files with 435 additions and 0 deletions
--- a/openspec/changes/archive/2025-12-11-add-dify-audio-transcription/proposal.md
+++ b/openspec/changes/archive/2025-12-11-add-dify-audio-transcription/proposal.md
@@ -0,0 +1,28 @@
+# Change: Add Dify Audio Transcription for Uploaded Files
+
+## Why
+Users need to transcribe pre-recorded audio files (e.g., meeting recordings from external sources). Currently, transcription only works with real-time recording via the local sidecar. Adding Dify-based transcription for uploaded files provides flexibility while keeping real-time transcription local for low latency.
+
+## What Changes
+- Add audio file upload UI in Electron client (meeting detail page)
+- Add `segment_audio` command to sidecar for VAD-based audio chunking
+- Add backend API endpoint to receive audio files, chunk via sidecar, and forward to Dify STT service
+- Each chunk (~5 minutes max) sent to Dify separately, results concatenated
+- Transcription result replaces the transcript field (same as real-time transcription)
+- Support common audio formats: MP3, WAV, M4A, WebM, OGG
+
+## Impact
+- Affected specs: `transcription`
+- Affected code:
+  - `sidecar/transcriber.py` - Add `segment_audio` action for VAD chunking
+  - `client/src/pages/meeting-detail.html` - Add upload button and progress UI
+  - `backend/app/routers/ai.py` - Add `/api/ai/transcribe-audio` endpoint
+  - `backend/app/config.py` - Add Dify STT API key configuration
+
+## Technical Notes
+- Dify STT API Key: `app-xQeSipaQecs0cuKeLvYDaRsu`
+- Real-time transcription continues to use local sidecar (no change)
+- File upload transcription uses Dify cloud service with VAD chunking
+- VAD chunking ensures each chunk < 25MB (Dify API limit)
+- Max file size: 500MB (chunked processing handles large files)
+- Both methods output to the same transcript_blob field