File size: 4,234 Bytes
3049727
 
 
 
 
 
 
 
 
 
 
 
96fe96c
3049727
96fe96c
8c1d8a0
96fe96c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
---
title: Voxtral
emoji: 
colorFrom: gray
colorTo: green
sdk: gradio
sdk_version: 5.38.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: Chat and transcribe audio files with AI, powered by Voxtral.
---
# Voxtral Pro Interface

<div align="center">

![Python](https://img.shields.io/badge/Python-3.9+-blue?logo=python&logoColor=white)
![Gradio](https://img.shields.io/badge/Gradio-5.37-orange?logo=gradio)
![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)
<a href="https://huggingface.co/spaces/hasanbasbunar/Voxtral">![Hugging Face Spaces](https://img.shields.io/badge/🤗%20Hugging%20Face-Spaces-yellow)</a>

</div>

<p align="center">
  An advanced, feature-rich Gradio UI to explore the full power of Mistral AI's multimodal model, `voxtral`.
</p>

<p align="center">
  <img src="image.png" alt="Voxtral Pro Demo" width="80%">
</p>

<p align="center">
  <img src="image-1.png" alt="Voxtral Pro Demo" width="80%">
</p>

## 🚀 About The Project

Voxtral Pro was created to explore and showcase the full range of capabilities of Mistral AI's powerful multimodal model, `voxtral`. This application goes beyond a simple chat interface to provide a comprehensive toolkit for interacting with audio and text, demonstrating features like high-quality transcription, multi-turn multimodal conversation, and agent-like tool use.

This project serves as a practical example of how to build robust, user-friendly, and production-ready applications on top of state-of-the-art foundation models.

## ✨ Key Features

* **🎙️ High-Quality Transcription:** Transcribe large audio files with exceptional accuracy using the Mistral API.
* **📄 SRT Subtitle Generation:** Automatically generate and export `.srt` subtitle files with precise segment timestamps, perfect for content creators.
* **💬 Multimodal Chat:** Engage in rich, multi-turn conversations combining both text and audio inputs simultaneously.
* **🤖 Tool Use / Function Calling:** Demonstrates the model's ability to call external functions to retrieve information (e.g., getting city data), showcasing its agent-like capabilities.
* **🔐 Secure API Key Handling:** Your Mistral API key is stored securely in your browser's session storage and is never exposed or saved elsewhere.
* **🎨 Modern UI:** A clean, responsive, and aesthetically pleasing interface built with Gradio.

## 🛠️ Tech Stack

This project is built with a modern, asynchronous Python stack:

* **Backend:** [Python](https://www.python.org/)
* **Web Framework:** [Gradio](https://www.gradio.app/)
* **API Client:** [httpx](https://www.python-httpx.org/) with `asyncio` for non-blocking API calls.
* **Deployment:** [Hugging Face Spaces](https://huggingface.co/spaces)

## 🏁 Getting Started

Follow these instructions to get a local copy up and running.

### Prerequisites

* Python 3.9+
* Git

### Installation & Configuration

1.  **Clone the repository:**
    
    git clone [https://huggingface.co/spaces/hasanbasbunar/Voxtral](https://huggingface.co/spaces/hasanbasbunar/Voxtral) && cd Voxtral
    

2.  **Create and activate a virtual environment:**
    ```sh
    python3 -m venv .venv
    source .venv/bin/activate
    ```

3.  **Install dependencies:**
    ```sh
    pip install -r requirements.txt
    ```

4.  **Configure your API Key:**
    Create a file named `.env` in the root of the project and add your Mistral API key:
    ```
    MISTRAL_API_KEY="your_api_key_here"
    ```
    *The application is also designed to let you enter the key directly in the UI if you prefer not to use an `.env` file.*

### Running the Application

1.  **Launch the app:**
    ```sh
    python app.py
    ```
2.  Open your browser and navigate to `http://127.0.0.1:7860`.

## 🚢 Deployment

This app is designed to be easily deployed. It is currently live on [Hugging Face Spaces](https://huggingface.co/spaces/hasanbasbunar/Voxtral).

To deploy your own version, you can use any platform that supports Python applications. For a production environment, ensure `debug=False` in `app.py`.

Example for platforms that use a `PORT` environment variable:
```python
# in app.py
demo.launch(server_port=int(os.environ.get("PORT", 7860)), debug=False)