anuragsingh922 commited on
Commit
606f718
Β·
verified Β·
1 Parent(s): d7dfeff

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -24
README.md CHANGED
@@ -1,5 +1,5 @@
1
- # **Realtime TTS System**
2
- This repository contains the complete codebase for building your personal Realtime Text-to-Speech (TTS) solution. It integrates a powerful TTS model, gRPC communication, an Express server, and a React-based client. Follow this guide to set up and explore the system effectively.
3
 
4
  ---
5
 
@@ -8,8 +8,8 @@ This repository contains the complete codebase for building your personal Realti
8
  β”œβ”€β”€ backend/ # Express server for handling API requests
9
  β”œβ”€β”€ frontend/ # React client for user interaction
10
  β”œβ”€β”€ .env # Environment variables (OpenAI API key, etc.)
11
- β”œβ”€β”€ voices # all available voices
12
- β”œβ”€β”€ demo # demo files of model
13
  β”œβ”€β”€ other...
14
  ```
15
 
@@ -20,8 +20,8 @@ This repository contains the complete codebase for building your personal Realti
20
  ### **Step 1: Clone the Repository**
21
  Clone this repository to your local machine:
22
  ```bash
23
- git clone https://huggingface.co/anuragsingh922/realtime-tts
24
- cd realtime-tts
25
  ```
26
 
27
  ---
@@ -52,7 +52,7 @@ pip install -r requirements.txt
52
  ```
53
 
54
  ### **Installing eSpeak**
55
- `eSpeak` is a necessary dependency for the TTS system. Follow the instructions below to install it on your platform:
56
 
57
  #### **Ubuntu/Linux**
58
  Use the `apt-get` package manager to install `eSpeak`:
@@ -134,12 +134,12 @@ This should output "Hello, world!" as audio on your system.
134
 
135
  ---
136
 
137
- ### **Step 6: Start the TTS Server**
138
  1. Add your OpenAI API key to the `.env` file:
139
  - Open `.env` in a text editor.
140
  - Replace `<openai_api_key>` with your actual OpenAI API key.
141
 
142
- 2. Start the TTS server:
143
  ```bash
144
  python3 app.py
145
  ```
@@ -149,36 +149,37 @@ This should output "Hello, world!" as audio on your system.
149
  ### **Step 7: Test the Full System**
150
  - Once all servers are running:
151
  1. Access the React client at [http://localhost:3000](http://localhost:3000).
152
- 2. Interact with the TTS system via the web interface.
153
 
154
  ---
155
 
156
  ## **Model Used**
157
- This project utilizes the [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) TTS model hosted on Hugging Face. The model generates high-quality, realtime text-to-speech outputs.
158
 
159
  ---
160
 
161
  ## **Key Features**
162
- 1. **Realtime TTS Generation**: Convert text input into speech with minimal latency.
163
  2. **React Client**: A user-friendly frontend for interaction.
164
- 3. **Express Backend**: Handles API requests and integrates the TTS system with external services.
165
- 4. **gRPC Communication**: Seamless communication between the TTS server and other components.
166
- 5. **Configurable APIs**: Supports OpenAI and Deepgram API integrations.
167
 
168
  ---
169
 
170
  ## **Dependencies**
171
 
172
  ### Python:
173
- - `torch`, `torchvision`, `torchaudio`
174
- - `phonemizer`
175
- - `transformers`
176
- - `scipy`
177
- - `munch`
178
- - `python-dotenv`
179
- - `openai`
180
- - `grpcio`, `grpcio-tools`
181
- - `espeak`
 
182
 
183
  ### Node.js:
184
  - Express server dependencies (`npm install` in `backend`).
 
1
+ # **VocRT**
2
+ This repository contains the complete codebase for building your personal Realtime Voice-to-Voice (V2V) solution. It integrates a powerful TTS model, gRPC communication, an Express server, and a React-based client. Follow this guide to set up and explore the system effectively.
3
 
4
  ---
5
 
 
8
  β”œβ”€β”€ backend/ # Express server for handling API requests
9
  β”œβ”€β”€ frontend/ # React client for user interaction
10
  β”œβ”€β”€ .env # Environment variables (OpenAI API key, etc.)
11
+ β”œβ”€β”€ voices # All available voices
12
+ β”œβ”€β”€ demo # Contains sample audio and demo files
13
  β”œβ”€β”€ other...
14
  ```
15
 
 
20
  ### **Step 1: Clone the Repository**
21
  Clone this repository to your local machine:
22
  ```bash
23
+ git clone https://huggingface.co/anuragsingh922/VocRT
24
+ cd VocRT
25
  ```
26
 
27
  ---
 
52
  ```
53
 
54
  ### **Installing eSpeak**
55
+ `eSpeak` is a necessary dependency for the VocRT system. Follow the instructions below to install it on your platform:
56
 
57
  #### **Ubuntu/Linux**
58
  Use the `apt-get` package manager to install `eSpeak`:
 
134
 
135
  ---
136
 
137
+ ### **Step 6: Start the VocRT Server**
138
  1. Add your OpenAI API key to the `.env` file:
139
  - Open `.env` in a text editor.
140
  - Replace `<openai_api_key>` with your actual OpenAI API key.
141
 
142
+ 2. Start the VocRT server:
143
  ```bash
144
  python3 app.py
145
  ```
 
149
  ### **Step 7: Test the Full System**
150
  - Once all servers are running:
151
  1. Access the React client at [http://localhost:3000](http://localhost:3000).
152
+ 2. Interact with the VocRT system via the web interface.
153
 
154
  ---
155
 
156
  ## **Model Used**
157
+ VocRT uses [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) for text-to-speech synthesis, processing user inputs into high-quality voice responses.
158
 
159
  ---
160
 
161
  ## **Key Features**
162
+ 1. **Realtime voice response generation**: Convert speech input into speech with minimal latency.
163
  2. **React Client**: A user-friendly frontend for interaction.
164
+ 3. **Express Backend**: Handles API requests and integrates the VocRT system with external services.
165
+ 4. **gRPC Communication**: Seamless communication between the VocRT server and other components.
166
+ 5. **Configurable APIs**: Integrates with OpenAI and Deepgram APIs for speech recognition and text generation.
167
 
168
  ---
169
 
170
  ## **Dependencies**
171
 
172
  ### Python:
173
+ - torch, torchvision, torchaudio
174
+ - phonemizer
175
+ - transformers
176
+ - scipy
177
+ - munch
178
+ - python-dotenv
179
+ - openai
180
+ - grpcio, grpcio-tools
181
+ - espeak
182
+
183
 
184
  ### Node.js:
185
  - Express server dependencies (`npm install` in `backend`).