Spaces:
Paused
Paused
metadata
title: WhisperPyanoteLLM
emoji: π
colorFrom: indigo
colorTo: green
sdk: docker
pinned: false
license: apache-2.0
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
WhisperPyanoteLLM
A FastAPI-based app for speaker diarization and transcription using Whisper and PyAnnote, with LLM-powered summarization.
Features
- Speaker diarization with pyannote.audio
- Transcription with OpenAI Whisper
- Summarization with Together LLM
- REST API for video/audio upload and processing
Quick Start (Development)
Clone the repository:
git clone <your-repo-url> cd WhisperPyanoteLLMCreate a
.envfile:HF_TOKEN=your_huggingface_token TOGETHER_API_KEY=your_together_api_key NGROK_AUTH_TOKEN=your_ngrok_tokenInstall dependencies:
pip install -r requirements.txtRun the app:
uvicorn app:app --reload --port 8300Access the API:
- Health check: http://localhost:8300/health
- Upload endpoint:
/upload_video/
Production (Docker)
Create a
.env.prodfile:HF_TOKEN=your_huggingface_token TOGETHER_API_KEY=your_together_api_key NGROK_AUTH_TOKEN=your_ngrok_tokenBuild the Docker image:
docker build -t whisperpyanote .Run the Docker container:
docker run --env-file .env.prod -p 8300:8300 whisperpyanoteAccess the API:
- Health check: http://localhost:8300/health
- Upload endpoint:
/upload_video/
Notes
- Make sure your
.envand.env.prodfiles are not committed to version control. - For best performance, run on a machine with a CUDA-enabled GPU.
- For more details, see the code and comments in
app.py.
License
Apache-2.0