File size: 5,599 Bytes
2079c1d
 
 
 
 
606f718
 
d7dfeff
 
 
 
 
 
 
 
606f718
 
d7dfeff
 
 
 
 
a8356b0
 
 
 
 
 
d7dfeff
 
 
 
 
606f718
 
d7dfeff
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7f87c54
d7dfeff
 
 
606f718
d7dfeff
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
606f718
d7dfeff
 
 
 
606f718
d7dfeff
 
 
 
 
 
 
 
 
606f718
d7dfeff
 
 
 
606f718
d7dfeff
 
 
 
606f718
d7dfeff
606f718
 
 
d7dfeff
 
 
 
 
 
606f718
 
 
 
 
 
 
 
 
 
d7dfeff
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
---
license: mit
base_model:
- hexgrad/Kokoro-82M
---
# **VocRT**  
This repository contains the complete codebase for building your personal Realtime Voice-to-Voice (V2V) solution. It integrates a powerful TTS model, gRPC communication, an Express server, and a React-based client. Follow this guide to set up and explore the system effectively.

---

## **Repository Structure**
```
β”œβ”€β”€ backend/         # Express server for handling API requests
β”œβ”€β”€ frontend/        # React client for user interaction
β”œβ”€β”€ .env             # Environment variables (OpenAI API key, etc.)
β”œβ”€β”€ voices           # All available voices
β”œβ”€β”€ demo             # Contains sample audio and demo files
β”œβ”€β”€ other...
```

---

## **Docker**

🐳 VocRT on Docker Hub: https://hub.docker.com/r/anuragsingh922/vocrt


## **Repository**
## **Setup Guide**

### **Step 1: Clone the Repository**
Clone this repository to your local machine:
```bash
git clone https://huggingface.co/anuragsingh922/VocRT
cd VocRT
```

---

### **Step 2: Python Virtual Environment Setup**
Create a virtual environment to manage dependencies:

#### macOS/Linux:
```bash
python3 -m venv venv
source venv/bin/activate
```

#### Windows:
```bash
python -m venv venv
venv\Scripts\activate
```

---

### **Step 3: Install Python Dependencies**
With the virtual environment activated, install the required dependencies:
```bash
pip install --upgrade pip setuptools wheel
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
pip install phonemizer transformers scipy munch python-dotenv openai grpcio grpcio-tools
```

### **Installing eSpeak**
`eSpeak` is a necessary dependency for the VocRT system. Follow the instructions below to install it on your platform:

#### **Ubuntu/Linux**
Use the `apt-get` package manager to install `eSpeak`:
```bash
sudo apt-get update
sudo apt-get install espeak
```

#### **macOS**
Install `eSpeak` using [Homebrew](https://brew.sh/):
1. Ensure Homebrew is installed on your system:
   ```bash
   /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
   ```
2. Install `espeak`:
   ```bash
   brew install espeak
   ```

#### **Windows**
For Windows, follow these steps to install `eSpeak`:
1. Download the eSpeak installer from the official website: [eSpeak Downloads](http://espeak.sourceforge.net/download.html).
2. Run the installer and follow the on-screen instructions to complete the installation.
3. Add the `eSpeak` installation path to your system's `PATH` environment variable:
   - Open **System Properties** β†’ **Advanced** β†’ **Environment Variables**.
   - In the "System Variables" section, find the `Path` variable and edit it.
   - Add the path to the `espeak.exe` file (e.g., `C:\Program Files (x86)\eSpeak`).
4. Verify the installation:
   Open Command Prompt and run:
   ```cmd
   espeak --version
   ```

---

### **Verification**
After installing `eSpeak`, verify it is correctly set up by running:
```bash
espeak "Hello, world!"
```

This should output "Hello, world!" as audio on your system.

---

### **Step 4: Backend Setup (Express Server)**
1. Navigate to the `backend` directory:
   ```bash
   cd backend
   ```
2. Install Node.js dependencies:
   ```bash
   npm install
   ```
3. Update the `config.env` file with your Deepgram API key:
   - Open `config.env` in a text editor.
   - Replace `<deepgram_api_key>` with your actual Deepgram API key.

4. Start the Express server:
   ```bash
   node app.js
   ```

---

### **Step 5: Frontend Setup (React Client)**
1. Open a new terminal and navigate to the `frontend` directory:
   ```bash
   cd frontend
   ```
2. Install client dependencies:
   ```bash
   npm install
   ```
3. Start the client:
   ```bash
   npm start
   ```

---

### **Step 6: Start the VocRT Server**
1. Add your OpenAI API key to the `.env` file:
   - Open `.env` in a text editor.
   - Replace `<openai_api_key>` with your actual OpenAI API key.

2. Start the VocRT server:
   ```bash
   python3 app.py
   ```

---

### **Step 7: Test the Full System**
- Once all servers are running:
  1. Access the React client at [http://localhost:3000](http://localhost:3000).
  2. Interact with the VocRT system via the web interface.

---

## **Model Used**
VocRT uses [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) for text-to-speech synthesis, processing user inputs into high-quality voice responses.

---

## **Key Features**
1. **Realtime voice response generation**: Convert speech input into speech with minimal latency.
2. **React Client**: A user-friendly frontend for interaction.
3. **Express Backend**: Handles API requests and integrates the VocRT system with external services.
4. **gRPC Communication**: Seamless communication between the VocRT server and other components.
5. **Configurable APIs**: Integrates with OpenAI and Deepgram APIs for speech recognition and text generation.

---

## **Dependencies**

### Python:
  - torch, torchvision, torchaudio
  - phonemizer
  - transformers
  - scipy
  - munch
  - python-dotenv
  - openai
  - grpcio, grpcio-tools
  - espeak


### Node.js:
- Express server dependencies (`npm install` in `backend`).
- React client dependencies (`npm install` in `frontend`).

---

## **Contributing**
Contributions are welcome! Feel free to fork this repository and create a pull request with your improvements.

---

## **Acknowledgments**
- [Hugging Face](https://huggingface.co/) for hosting the Kokoro-82M model.
- The amazing communities behind PyTorch, OpenAI, and Deepgram APIs.