Update README.md
Browse files
README.md
CHANGED
@@ -1,5 +1,5 @@
|
|
1 |
-
# **
|
2 |
-
This repository contains the complete codebase for building your personal Realtime
|
3 |
|
4 |
---
|
5 |
|
@@ -8,8 +8,8 @@ This repository contains the complete codebase for building your personal Realti
|
|
8 |
βββ backend/ # Express server for handling API requests
|
9 |
βββ frontend/ # React client for user interaction
|
10 |
βββ .env # Environment variables (OpenAI API key, etc.)
|
11 |
-
βββ voices #
|
12 |
-
βββ demo # demo files
|
13 |
βββ other...
|
14 |
```
|
15 |
|
@@ -20,8 +20,8 @@ This repository contains the complete codebase for building your personal Realti
|
|
20 |
### **Step 1: Clone the Repository**
|
21 |
Clone this repository to your local machine:
|
22 |
```bash
|
23 |
-
git clone https://huggingface.co/anuragsingh922/
|
24 |
-
cd
|
25 |
```
|
26 |
|
27 |
---
|
@@ -52,7 +52,7 @@ pip install -r requirements.txt
|
|
52 |
```
|
53 |
|
54 |
### **Installing eSpeak**
|
55 |
-
`eSpeak` is a necessary dependency for the
|
56 |
|
57 |
#### **Ubuntu/Linux**
|
58 |
Use the `apt-get` package manager to install `eSpeak`:
|
@@ -134,12 +134,12 @@ This should output "Hello, world!" as audio on your system.
|
|
134 |
|
135 |
---
|
136 |
|
137 |
-
### **Step 6: Start the
|
138 |
1. Add your OpenAI API key to the `.env` file:
|
139 |
- Open `.env` in a text editor.
|
140 |
- Replace `<openai_api_key>` with your actual OpenAI API key.
|
141 |
|
142 |
-
2. Start the
|
143 |
```bash
|
144 |
python3 app.py
|
145 |
```
|
@@ -149,36 +149,37 @@ This should output "Hello, world!" as audio on your system.
|
|
149 |
### **Step 7: Test the Full System**
|
150 |
- Once all servers are running:
|
151 |
1. Access the React client at [http://localhost:3000](http://localhost:3000).
|
152 |
-
2. Interact with the
|
153 |
|
154 |
---
|
155 |
|
156 |
## **Model Used**
|
157 |
-
|
158 |
|
159 |
---
|
160 |
|
161 |
## **Key Features**
|
162 |
-
1. **Realtime
|
163 |
2. **React Client**: A user-friendly frontend for interaction.
|
164 |
-
3. **Express Backend**: Handles API requests and integrates the
|
165 |
-
4. **gRPC Communication**: Seamless communication between the
|
166 |
-
5. **Configurable APIs**:
|
167 |
|
168 |
---
|
169 |
|
170 |
## **Dependencies**
|
171 |
|
172 |
### Python:
|
173 |
-
-
|
174 |
-
-
|
175 |
-
-
|
176 |
-
-
|
177 |
-
-
|
178 |
-
-
|
179 |
-
-
|
180 |
-
-
|
181 |
-
-
|
|
|
182 |
|
183 |
### Node.js:
|
184 |
- Express server dependencies (`npm install` in `backend`).
|
|
|
1 |
+
# **VocRT**
|
2 |
+
This repository contains the complete codebase for building your personal Realtime Voice-to-Voice (V2V) solution. It integrates a powerful TTS model, gRPC communication, an Express server, and a React-based client. Follow this guide to set up and explore the system effectively.
|
3 |
|
4 |
---
|
5 |
|
|
|
8 |
βββ backend/ # Express server for handling API requests
|
9 |
βββ frontend/ # React client for user interaction
|
10 |
βββ .env # Environment variables (OpenAI API key, etc.)
|
11 |
+
βββ voices # All available voices
|
12 |
+
βββ demo # Contains sample audio and demo files
|
13 |
βββ other...
|
14 |
```
|
15 |
|
|
|
20 |
### **Step 1: Clone the Repository**
|
21 |
Clone this repository to your local machine:
|
22 |
```bash
|
23 |
+
git clone https://huggingface.co/anuragsingh922/VocRT
|
24 |
+
cd VocRT
|
25 |
```
|
26 |
|
27 |
---
|
|
|
52 |
```
|
53 |
|
54 |
### **Installing eSpeak**
|
55 |
+
`eSpeak` is a necessary dependency for the VocRT system. Follow the instructions below to install it on your platform:
|
56 |
|
57 |
#### **Ubuntu/Linux**
|
58 |
Use the `apt-get` package manager to install `eSpeak`:
|
|
|
134 |
|
135 |
---
|
136 |
|
137 |
+
### **Step 6: Start the VocRT Server**
|
138 |
1. Add your OpenAI API key to the `.env` file:
|
139 |
- Open `.env` in a text editor.
|
140 |
- Replace `<openai_api_key>` with your actual OpenAI API key.
|
141 |
|
142 |
+
2. Start the VocRT server:
|
143 |
```bash
|
144 |
python3 app.py
|
145 |
```
|
|
|
149 |
### **Step 7: Test the Full System**
|
150 |
- Once all servers are running:
|
151 |
1. Access the React client at [http://localhost:3000](http://localhost:3000).
|
152 |
+
2. Interact with the VocRT system via the web interface.
|
153 |
|
154 |
---
|
155 |
|
156 |
## **Model Used**
|
157 |
+
VocRT uses [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) for text-to-speech synthesis, processing user inputs into high-quality voice responses.
|
158 |
|
159 |
---
|
160 |
|
161 |
## **Key Features**
|
162 |
+
1. **Realtime voice response generation**: Convert speech input into speech with minimal latency.
|
163 |
2. **React Client**: A user-friendly frontend for interaction.
|
164 |
+
3. **Express Backend**: Handles API requests and integrates the VocRT system with external services.
|
165 |
+
4. **gRPC Communication**: Seamless communication between the VocRT server and other components.
|
166 |
+
5. **Configurable APIs**: Integrates with OpenAI and Deepgram APIs for speech recognition and text generation.
|
167 |
|
168 |
---
|
169 |
|
170 |
## **Dependencies**
|
171 |
|
172 |
### Python:
|
173 |
+
- torch, torchvision, torchaudio
|
174 |
+
- phonemizer
|
175 |
+
- transformers
|
176 |
+
- scipy
|
177 |
+
- munch
|
178 |
+
- python-dotenv
|
179 |
+
- openai
|
180 |
+
- grpcio, grpcio-tools
|
181 |
+
- espeak
|
182 |
+
|
183 |
|
184 |
### Node.js:
|
185 |
- Express server dependencies (`npm install` in `backend`).
|