chore: Archive add-dify-audio-transcription proposal

Move completed Dify audio transcription proposal to archive and update transcription spec with new capabilities. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 21:05:01 +08:00
parent 263eb1c394
commit 2e78e3760a
5 changed files with 435 additions and 0 deletions
--- a/openspec/changes/archive/2025-12-11-add-dify-audio-transcription/tasks.md
+++ b/openspec/changes/archive/2025-12-11-add-dify-audio-transcription/tasks.md
@@ -0,0 +1,47 @@
+# Implementation Tasks
+
+## 1. Backend Configuration
+- [x] 1.1 Add `DIFY_STT_API_KEY` to `backend/app/config.py`
+- [x] 1.2 Add `DIFY_STT_API_KEY` to `backend/.env.example`
+
+## 2. Sidecar VAD Segmentation
+- [x] 2.1 Add `segment_audio` action handler in `sidecar/transcriber.py`
+- [x] 2.2 Implement VAD-based audio segmentation using Silero VAD
+- [x] 2.3 Support max chunk duration (default 5 minutes)
+- [x] 2.4 Support minimum silence threshold (default 500ms)
+- [x] 2.5 Export chunks as WAV files to temp directory
+- [x] 2.6 Return segment metadata (paths, timestamps)
+
+## 3. Backend API Endpoint
+- [x] 3.1 Create `POST /api/ai/transcribe-audio` endpoint in `backend/app/routers/ai.py`
+- [x] 3.2 Implement multipart file upload handling (max 500MB)
+- [x] 3.3 Validate audio file format (MP3, WAV, M4A, WebM, OGG)
+- [x] 3.4 Save uploaded file to temp directory
+- [x] 3.5 Call sidecar `segment_audio` for VAD chunking
+- [x] 3.6 For each chunk: call Dify STT API (`/v1/audio-to-text`)
+- [x] 3.7 Implement retry with exponential backoff for Dify errors
+- [x] 3.8 Concatenate chunk transcriptions
+- [x] 3.9 Clean up temp files after processing
+- [x] 3.10 Return final transcript with metadata
+
+## 4. Frontend UI
+- [x] 4.1 Add "Upload Audio" button in meeting-detail.html (next to recording controls)
+- [x] 4.2 Implement file input with accepted audio formats
+- [x] 4.3 Add upload progress indicator (upload phase)
+- [x] 4.4 Add transcription progress indicator (chunk X of Y)
+- [x] 4.5 Show confirmation dialog if transcript already has content
+- [x] 4.6 Display transcription result in transcript area
+- [x] 4.7 Handle error states (file too large, unsupported format, API error)
+
+## 5. API Service
+- [x] 5.1 Add `transcribeAudio()` function to `client/src/services/api.js`
+- [x] 5.2 Implement FormData upload with progress tracking
+- [x] 5.3 Handle streaming response for chunk progress
+
+## 6. Testing
+- [ ] 6.1 Test sidecar VAD segmentation with various audio lengths
+- [ ] 6.2 Test with various audio formats (MP3, WAV, M4A, WebM, OGG)
+- [ ] 6.3 Test with large file (>100MB) to verify chunking
+- [ ] 6.4 Test error handling (invalid format, Dify timeout, API error)
+- [ ] 6.5 Verify transcript displays correctly after upload
+- [ ] 6.6 Test chunk concatenation quality (no missing content at boundaries)