File size: 5,599 Bytes
2079c1d 606f718 d7dfeff 606f718 d7dfeff a8356b0 d7dfeff 606f718 d7dfeff 7f87c54 d7dfeff 606f718 d7dfeff 606f718 d7dfeff 606f718 d7dfeff 606f718 d7dfeff 606f718 d7dfeff 606f718 d7dfeff 606f718 d7dfeff 606f718 d7dfeff |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 |
---
license: mit
base_model:
- hexgrad/Kokoro-82M
---
# **VocRT**
This repository contains the complete codebase for building your personal Realtime Voice-to-Voice (V2V) solution. It integrates a powerful TTS model, gRPC communication, an Express server, and a React-based client. Follow this guide to set up and explore the system effectively.
---
## **Repository Structure**
```
βββ backend/ # Express server for handling API requests
βββ frontend/ # React client for user interaction
βββ .env # Environment variables (OpenAI API key, etc.)
βββ voices # All available voices
βββ demo # Contains sample audio and demo files
βββ other...
```
---
## **Docker**
π³ VocRT on Docker Hub: https://hub.docker.com/r/anuragsingh922/vocrt
## **Repository**
## **Setup Guide**
### **Step 1: Clone the Repository**
Clone this repository to your local machine:
```bash
git clone https://huggingface.co/anuragsingh922/VocRT
cd VocRT
```
---
### **Step 2: Python Virtual Environment Setup**
Create a virtual environment to manage dependencies:
#### macOS/Linux:
```bash
python3 -m venv venv
source venv/bin/activate
```
#### Windows:
```bash
python -m venv venv
venv\Scripts\activate
```
---
### **Step 3: Install Python Dependencies**
With the virtual environment activated, install the required dependencies:
```bash
pip install --upgrade pip setuptools wheel
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
pip install phonemizer transformers scipy munch python-dotenv openai grpcio grpcio-tools
```
### **Installing eSpeak**
`eSpeak` is a necessary dependency for the VocRT system. Follow the instructions below to install it on your platform:
#### **Ubuntu/Linux**
Use the `apt-get` package manager to install `eSpeak`:
```bash
sudo apt-get update
sudo apt-get install espeak
```
#### **macOS**
Install `eSpeak` using [Homebrew](https://brew.sh/):
1. Ensure Homebrew is installed on your system:
```bash
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
```
2. Install `espeak`:
```bash
brew install espeak
```
#### **Windows**
For Windows, follow these steps to install `eSpeak`:
1. Download the eSpeak installer from the official website: [eSpeak Downloads](http://espeak.sourceforge.net/download.html).
2. Run the installer and follow the on-screen instructions to complete the installation.
3. Add the `eSpeak` installation path to your system's `PATH` environment variable:
- Open **System Properties** β **Advanced** β **Environment Variables**.
- In the "System Variables" section, find the `Path` variable and edit it.
- Add the path to the `espeak.exe` file (e.g., `C:\Program Files (x86)\eSpeak`).
4. Verify the installation:
Open Command Prompt and run:
```cmd
espeak --version
```
---
### **Verification**
After installing `eSpeak`, verify it is correctly set up by running:
```bash
espeak "Hello, world!"
```
This should output "Hello, world!" as audio on your system.
---
### **Step 4: Backend Setup (Express Server)**
1. Navigate to the `backend` directory:
```bash
cd backend
```
2. Install Node.js dependencies:
```bash
npm install
```
3. Update the `config.env` file with your Deepgram API key:
- Open `config.env` in a text editor.
- Replace `<deepgram_api_key>` with your actual Deepgram API key.
4. Start the Express server:
```bash
node app.js
```
---
### **Step 5: Frontend Setup (React Client)**
1. Open a new terminal and navigate to the `frontend` directory:
```bash
cd frontend
```
2. Install client dependencies:
```bash
npm install
```
3. Start the client:
```bash
npm start
```
---
### **Step 6: Start the VocRT Server**
1. Add your OpenAI API key to the `.env` file:
- Open `.env` in a text editor.
- Replace `<openai_api_key>` with your actual OpenAI API key.
2. Start the VocRT server:
```bash
python3 app.py
```
---
### **Step 7: Test the Full System**
- Once all servers are running:
1. Access the React client at [http://localhost:3000](http://localhost:3000).
2. Interact with the VocRT system via the web interface.
---
## **Model Used**
VocRT uses [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) for text-to-speech synthesis, processing user inputs into high-quality voice responses.
---
## **Key Features**
1. **Realtime voice response generation**: Convert speech input into speech with minimal latency.
2. **React Client**: A user-friendly frontend for interaction.
3. **Express Backend**: Handles API requests and integrates the VocRT system with external services.
4. **gRPC Communication**: Seamless communication between the VocRT server and other components.
5. **Configurable APIs**: Integrates with OpenAI and Deepgram APIs for speech recognition and text generation.
---
## **Dependencies**
### Python:
- torch, torchvision, torchaudio
- phonemizer
- transformers
- scipy
- munch
- python-dotenv
- openai
- grpcio, grpcio-tools
- espeak
### Node.js:
- Express server dependencies (`npm install` in `backend`).
- React client dependencies (`npm install` in `frontend`).
---
## **Contributing**
Contributions are welcome! Feel free to fork this repository and create a pull request with your improvements.
---
## **Acknowledgments**
- [Hugging Face](https://huggingface.co/) for hosting the Kokoro-82M model.
- The amazing communities behind PyTorch, OpenAI, and Deepgram APIs. |