Spaces:
Running
A newer version of the Gradio SDK is available:
5.45.0
title: Voxtral
emoji: ⚡
colorFrom: gray
colorTo: green
sdk: gradio
sdk_version: 5.38.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: Chat and transcribe audio files with AI, powered by Voxtral.
Voxtral Pro Interface
An advanced, feature-rich Gradio UI to explore the full power of Mistral AI's multimodal model, `voxtral`.
🚀 About The Project
Voxtral Pro was created to explore and showcase the full range of capabilities of Mistral AI's powerful multimodal model, voxtral
. This application goes beyond a simple chat interface to provide a comprehensive toolkit for interacting with audio and text, demonstrating features like high-quality transcription, multi-turn multimodal conversation, and agent-like tool use.
This project serves as a practical example of how to build robust, user-friendly, and production-ready applications on top of state-of-the-art foundation models.
✨ Key Features
- 🎙️ High-Quality Transcription: Transcribe large audio files with exceptional accuracy using the Mistral API.
- 📄 SRT Subtitle Generation: Automatically generate and export
.srt
subtitle files with precise segment timestamps, perfect for content creators. - 💬 Multimodal Chat: Engage in rich, multi-turn conversations combining both text and audio inputs simultaneously.
- 🤖 Tool Use / Function Calling: Demonstrates the model's ability to call external functions to retrieve information (e.g., getting city data), showcasing its agent-like capabilities.
- 🔐 Secure API Key Handling: Your Mistral API key is stored securely in your browser's session storage and is never exposed or saved elsewhere.
- 🎨 Modern UI: A clean, responsive, and aesthetically pleasing interface built with Gradio.
🛠️ Tech Stack
This project is built with a modern, asynchronous Python stack:
- Backend: Python
- Web Framework: Gradio
- API Client: httpx with
asyncio
for non-blocking API calls. - Deployment: Hugging Face Spaces
🏁 Getting Started
Follow these instructions to get a local copy up and running.
Prerequisites
- Python 3.9+
- Git
Installation & Configuration
Clone the repository:
git clone https://huggingface.co/spaces/hasanbasbunar/Voxtral && cd Voxtral
Create and activate a virtual environment:
python3 -m venv .venv source .venv/bin/activate
Install dependencies:
pip install -r requirements.txt
Configure your API Key: Create a file named
.env
in the root of the project and add your Mistral API key:MISTRAL_API_KEY="your_api_key_here"
The application is also designed to let you enter the key directly in the UI if you prefer not to use an
.env
file.
Running the Application
- Launch the app:
python app.py
- Open your browser and navigate to
http://127.0.0.1:7860
.
🚢 Deployment
This app is designed to be easily deployed. It is currently live on Hugging Face Spaces.
To deploy your own version, you can use any platform that supports Python applications. For a production environment, ensure debug=False
in app.py
.
Example for platforms that use a PORT
environment variable:
# in app.py
demo.launch(server_port=int(os.environ.get("PORT", 7860)), debug=False)