Files
Meeting_Assistant/openspec/changes/archive/2025-12-11-add-dify-audio-transcription/proposal.md
egg 2e78e3760a chore: Archive add-dify-audio-transcription proposal
Move completed Dify audio transcription proposal to archive and update
transcription spec with new capabilities.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 21:05:01 +08:00

1.6 KiB

Change: Add Dify Audio Transcription for Uploaded Files

Why

Users need to transcribe pre-recorded audio files (e.g., meeting recordings from external sources). Currently, transcription only works with real-time recording via the local sidecar. Adding Dify-based transcription for uploaded files provides flexibility while keeping real-time transcription local for low latency.

What Changes

  • Add audio file upload UI in Electron client (meeting detail page)
  • Add segment_audio command to sidecar for VAD-based audio chunking
  • Add backend API endpoint to receive audio files, chunk via sidecar, and forward to Dify STT service
  • Each chunk (~5 minutes max) sent to Dify separately, results concatenated
  • Transcription result replaces the transcript field (same as real-time transcription)
  • Support common audio formats: MP3, WAV, M4A, WebM, OGG

Impact

  • Affected specs: transcription
  • Affected code:
    • sidecar/transcriber.py - Add segment_audio action for VAD chunking
    • client/src/pages/meeting-detail.html - Add upload button and progress UI
    • backend/app/routers/ai.py - Add /api/ai/transcribe-audio endpoint
    • backend/app/config.py - Add Dify STT API key configuration

Technical Notes

  • Dify STT API Key: app-xQeSipaQecs0cuKeLvYDaRsu
  • Real-time transcription continues to use local sidecar (no change)
  • File upload transcription uses Dify cloud service with VAD chunking
  • VAD chunking ensures each chunk < 25MB (Dify API limit)
  • Max file size: 500MB (chunked processing handles large files)
  • Both methods output to the same transcript_blob field