hasanbasbunar commited on
Commit
96fe96c
·
1 Parent(s): 14d9c3a

README update

Browse files
Files changed (2) hide show
  1. .gitignore +2 -1
  2. README.md +97 -54
.gitignore CHANGED
@@ -89,4 +89,5 @@ test_output/
89
  uploads/
90
 
91
  # Ignore SVGs if generated
92
- generated_svg/
 
 
89
  uploads/
90
 
91
  # Ignore SVGs if generated
92
+ generated_svg/
93
+ testt.py
README.md CHANGED
@@ -10,59 +10,102 @@ pinned: false
10
  license: apache-2.0
11
  short_description: Chat and transcribe audio files with AI, powered by Voxtral.
12
  ---
 
13
 
14
- # Voxtral
15
-
16
- **Multimodal chatbot and audio transcription web app powered by Gradio and Mistral API.**
17
-
18
- ## Features
19
- - Chatbot with text and audio input
20
- - Audio file transcription (with SRT export)
21
- - Modern Gradio web interface
22
- - API key management (secure, local to browser)
23
-
24
- ## Demo
25
- ![Screenshot](c29ca011-87ff-45b0-8236-08d629812732.svg)
26
-
27
- ## Installation
28
-
29
- 1. **Clone the repository**
30
- ```bash
31
- git clone <repo-url>
32
- cd voxtral-gradio
33
- ```
34
- 2. **Create and activate a virtual environment**
35
- ```bash
36
- python3 -m venv .venv
37
- source .venv/bin/activate
38
- ```
39
- 3. **Install dependencies**
40
- ```bash
41
- pip install -r requirements.txt
42
- ```
43
-
44
- ## Usage
45
-
46
- 1. **Run the app**
47
- ```bash
48
- python app.py
49
- ```
50
- 2. Open your browser and go to [http://localhost:7860](http://localhost:7860)
51
- 3. Enter your Mistral API key in the interface to start chatting or transcribing audio files.
52
-
53
- ## Configuration
54
- - **API Key:** Your Mistral API key is required for chat and transcription features. It is stored only in your browser session and never sent to any third-party server.
55
- - **Environment variables:** Not required by default. For cloud deployment, you may need to set the `PORT` environment variable.
56
-
57
- ## Deployment
58
- - For production, set `debug=False` in `app.py`.
59
- - Compatible with most Python hosting platforms (Heroku, Railway, etc.).
60
- - To specify a custom port:
61
- ```python
62
- demo.launch(server_port=int(os.environ.get("PORT", 7860)), debug=False)
63
- ```
64
-
65
- ## License
66
- MIT
67
 
68
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  license: apache-2.0
11
  short_description: Chat and transcribe audio files with AI, powered by Voxtral.
12
  ---
13
+ # Voxtral Pro Interface
14
 
15
+ <div align="center">
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
+ ![Python](https://img.shields.io/badge/Python-3.9+-blue?logo=python&logoColor=white)
18
+ ![Gradio](https://img.shields.io/badge/Gradio-5.37-orange?logo=gradio)
19
+ ![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)
20
+ <a href="https://huggingface.co/spaces/hasanbasbunar/Voxtral">![Hugging Face Spaces](https://img.shields.io/badge/🤗%20Hugging%20Face-Spaces-yellow)</a>
21
+
22
+ </div>
23
+
24
+ <p align="center">
25
+ An advanced, feature-rich Gradio UI to explore the full power of Mistral AI's multimodal model, `voxtral`.
26
+ </p>
27
+
28
+ <p align="center">
29
+ <img src="image.png" alt="Voxtral Pro Demo" width="80%">
30
+ </p>
31
+
32
+ <p align="center">
33
+ <img src="image-1.png" alt="Voxtral Pro Demo" width="80%">
34
+ </p>
35
+
36
+ ## 🚀 About The Project
37
+
38
+ Voxtral Pro was created to explore and showcase the full range of capabilities of Mistral AI's powerful multimodal model, `voxtral`. This application goes beyond a simple chat interface to provide a comprehensive toolkit for interacting with audio and text, demonstrating features like high-quality transcription, multi-turn multimodal conversation, and agent-like tool use.
39
+
40
+ This project serves as a practical example of how to build robust, user-friendly, and production-ready applications on top of state-of-the-art foundation models.
41
+
42
+ ## ✨ Key Features
43
+
44
+ * **🎙️ High-Quality Transcription:** Transcribe large audio files with exceptional accuracy using the Mistral API.
45
+ * **📄 SRT Subtitle Generation:** Automatically generate and export `.srt` subtitle files with precise segment timestamps, perfect for content creators.
46
+ * **💬 Multimodal Chat:** Engage in rich, multi-turn conversations combining both text and audio inputs simultaneously.
47
+ * **🤖 Tool Use / Function Calling:** Demonstrates the model's ability to call external functions to retrieve information (e.g., getting city data), showcasing its agent-like capabilities.
48
+ * **🔐 Secure API Key Handling:** Your Mistral API key is stored securely in your browser's session storage and is never exposed or saved elsewhere.
49
+ * **🎨 Modern UI:** A clean, responsive, and aesthetically pleasing interface built with Gradio.
50
+
51
+ ## 🛠️ Tech Stack
52
+
53
+ This project is built with a modern, asynchronous Python stack:
54
+
55
+ * **Backend:** [Python](https://www.python.org/)
56
+ * **Web Framework:** [Gradio](https://www.gradio.app/)
57
+ * **API Client:** [httpx](https://www.python-httpx.org/) with `asyncio` for non-blocking API calls.
58
+ * **Deployment:** [Hugging Face Spaces](https://huggingface.co/spaces)
59
+
60
+ ## 🏁 Getting Started
61
+
62
+ Follow these instructions to get a local copy up and running.
63
+
64
+ ### Prerequisites
65
+
66
+ * Python 3.9+
67
+ * Git
68
+
69
+ ### Installation & Configuration
70
+
71
+ 1. **Clone the repository:**
72
+
73
+ git clone [https://huggingface.co/spaces/hasanbasbunar/Voxtral](https://huggingface.co/spaces/hasanbasbunar/Voxtral) && cd Voxtral
74
+
75
+
76
+ 2. **Create and activate a virtual environment:**
77
+ ```sh
78
+ python3 -m venv .venv
79
+ source .venv/bin/activate
80
+ ```
81
+
82
+ 3. **Install dependencies:**
83
+ ```sh
84
+ pip install -r requirements.txt
85
+ ```
86
+
87
+ 4. **Configure your API Key:**
88
+ Create a file named `.env` in the root of the project and add your Mistral API key:
89
+ ```
90
+ MISTRAL_API_KEY="your_api_key_here"
91
+ ```
92
+ *The application is also designed to let you enter the key directly in the UI if you prefer not to use an `.env` file.*
93
+
94
+ ### Running the Application
95
+
96
+ 1. **Launch the app:**
97
+ ```sh
98
+ python app.py
99
+ ```
100
+ 2. Open your browser and navigate to `http://127.0.0.1:7860`.
101
+
102
+ ## 🚢 Deployment
103
+
104
+ This app is designed to be easily deployed. It is currently live on [Hugging Face Spaces](https://huggingface.co/spaces/hasanbasbunar/Voxtral).
105
+
106
+ To deploy your own version, you can use any platform that supports Python applications. For a production environment, ensure `debug=False` in `app.py`.
107
+
108
+ Example for platforms that use a `PORT` environment variable:
109
+ ```python
110
+ # in app.py
111
+ demo.launch(server_port=int(os.environ.get("PORT", 7860)), debug=False)