Voxtral / README.md
hasanbasbunar's picture
README update
96fe96c

A newer version of the Gradio SDK is available: 5.45.0

Upgrade
metadata
title: Voxtral
emoji: 
colorFrom: gray
colorTo: green
sdk: gradio
sdk_version: 5.38.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: Chat and transcribe audio files with AI, powered by Voxtral.

Voxtral Pro Interface

Python Gradio License: MIT Hugging Face Spaces

An advanced, feature-rich Gradio UI to explore the full power of Mistral AI's multimodal model, `voxtral`.

Voxtral Pro Demo

Voxtral Pro Demo

🚀 About The Project

Voxtral Pro was created to explore and showcase the full range of capabilities of Mistral AI's powerful multimodal model, voxtral. This application goes beyond a simple chat interface to provide a comprehensive toolkit for interacting with audio and text, demonstrating features like high-quality transcription, multi-turn multimodal conversation, and agent-like tool use.

This project serves as a practical example of how to build robust, user-friendly, and production-ready applications on top of state-of-the-art foundation models.

✨ Key Features

  • 🎙️ High-Quality Transcription: Transcribe large audio files with exceptional accuracy using the Mistral API.
  • 📄 SRT Subtitle Generation: Automatically generate and export .srt subtitle files with precise segment timestamps, perfect for content creators.
  • 💬 Multimodal Chat: Engage in rich, multi-turn conversations combining both text and audio inputs simultaneously.
  • 🤖 Tool Use / Function Calling: Demonstrates the model's ability to call external functions to retrieve information (e.g., getting city data), showcasing its agent-like capabilities.
  • 🔐 Secure API Key Handling: Your Mistral API key is stored securely in your browser's session storage and is never exposed or saved elsewhere.
  • 🎨 Modern UI: A clean, responsive, and aesthetically pleasing interface built with Gradio.

🛠️ Tech Stack

This project is built with a modern, asynchronous Python stack:

🏁 Getting Started

Follow these instructions to get a local copy up and running.

Prerequisites

  • Python 3.9+
  • Git

Installation & Configuration

  1. Clone the repository:

    git clone https://huggingface.co/spaces/hasanbasbunar/Voxtral && cd Voxtral

  2. Create and activate a virtual environment:

    python3 -m venv .venv
    source .venv/bin/activate
    
  3. Install dependencies:

    pip install -r requirements.txt
    
  4. Configure your API Key: Create a file named .env in the root of the project and add your Mistral API key:

    MISTRAL_API_KEY="your_api_key_here"
    

    The application is also designed to let you enter the key directly in the UI if you prefer not to use an .env file.

Running the Application

  1. Launch the app:
    python app.py
    
  2. Open your browser and navigate to http://127.0.0.1:7860.

🚢 Deployment

This app is designed to be easily deployed. It is currently live on Hugging Face Spaces.

To deploy your own version, you can use any platform that supports Python applications. For a production environment, ensure debug=False in app.py.

Example for platforms that use a PORT environment variable:

# in app.py
demo.launch(server_port=int(os.environ.get("PORT", 7860)), debug=False)