VocRT / README.md
anuragsingh922's picture
Update README.md
a8356b0 verified
---
license: mit
base_model:
- hexgrad/Kokoro-82M
---
# **VocRT**
This repository contains the complete codebase for building your personal Realtime Voice-to-Voice (V2V) solution. It integrates a powerful TTS model, gRPC communication, an Express server, and a React-based client. Follow this guide to set up and explore the system effectively.
---
## **Repository Structure**
```
β”œβ”€β”€ backend/ # Express server for handling API requests
β”œβ”€β”€ frontend/ # React client for user interaction
β”œβ”€β”€ .env # Environment variables (OpenAI API key, etc.)
β”œβ”€β”€ voices # All available voices
β”œβ”€β”€ demo # Contains sample audio and demo files
β”œβ”€β”€ other...
```
---
## **Docker**
🐳 VocRT on Docker Hub: https://hub.docker.com/r/anuragsingh922/vocrt
## **Repository**
## **Setup Guide**
### **Step 1: Clone the Repository**
Clone this repository to your local machine:
```bash
git clone https://huggingface.co/anuragsingh922/VocRT
cd VocRT
```
---
### **Step 2: Python Virtual Environment Setup**
Create a virtual environment to manage dependencies:
#### macOS/Linux:
```bash
python3 -m venv venv
source venv/bin/activate
```
#### Windows:
```bash
python -m venv venv
venv\Scripts\activate
```
---
### **Step 3: Install Python Dependencies**
With the virtual environment activated, install the required dependencies:
```bash
pip install --upgrade pip setuptools wheel
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
pip install phonemizer transformers scipy munch python-dotenv openai grpcio grpcio-tools
```
### **Installing eSpeak**
`eSpeak` is a necessary dependency for the VocRT system. Follow the instructions below to install it on your platform:
#### **Ubuntu/Linux**
Use the `apt-get` package manager to install `eSpeak`:
```bash
sudo apt-get update
sudo apt-get install espeak
```
#### **macOS**
Install `eSpeak` using [Homebrew](https://brew.sh/):
1. Ensure Homebrew is installed on your system:
```bash
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
```
2. Install `espeak`:
```bash
brew install espeak
```
#### **Windows**
For Windows, follow these steps to install `eSpeak`:
1. Download the eSpeak installer from the official website: [eSpeak Downloads](http://espeak.sourceforge.net/download.html).
2. Run the installer and follow the on-screen instructions to complete the installation.
3. Add the `eSpeak` installation path to your system's `PATH` environment variable:
- Open **System Properties** β†’ **Advanced** β†’ **Environment Variables**.
- In the "System Variables" section, find the `Path` variable and edit it.
- Add the path to the `espeak.exe` file (e.g., `C:\Program Files (x86)\eSpeak`).
4. Verify the installation:
Open Command Prompt and run:
```cmd
espeak --version
```
---
### **Verification**
After installing `eSpeak`, verify it is correctly set up by running:
```bash
espeak "Hello, world!"
```
This should output "Hello, world!" as audio on your system.
---
### **Step 4: Backend Setup (Express Server)**
1. Navigate to the `backend` directory:
```bash
cd backend
```
2. Install Node.js dependencies:
```bash
npm install
```
3. Update the `config.env` file with your Deepgram API key:
- Open `config.env` in a text editor.
- Replace `<deepgram_api_key>` with your actual Deepgram API key.
4. Start the Express server:
```bash
node app.js
```
---
### **Step 5: Frontend Setup (React Client)**
1. Open a new terminal and navigate to the `frontend` directory:
```bash
cd frontend
```
2. Install client dependencies:
```bash
npm install
```
3. Start the client:
```bash
npm start
```
---
### **Step 6: Start the VocRT Server**
1. Add your OpenAI API key to the `.env` file:
- Open `.env` in a text editor.
- Replace `<openai_api_key>` with your actual OpenAI API key.
2. Start the VocRT server:
```bash
python3 app.py
```
---
### **Step 7: Test the Full System**
- Once all servers are running:
1. Access the React client at [http://localhost:3000](http://localhost:3000).
2. Interact with the VocRT system via the web interface.
---
## **Model Used**
VocRT uses [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) for text-to-speech synthesis, processing user inputs into high-quality voice responses.
---
## **Key Features**
1. **Realtime voice response generation**: Convert speech input into speech with minimal latency.
2. **React Client**: A user-friendly frontend for interaction.
3. **Express Backend**: Handles API requests and integrates the VocRT system with external services.
4. **gRPC Communication**: Seamless communication between the VocRT server and other components.
5. **Configurable APIs**: Integrates with OpenAI and Deepgram APIs for speech recognition and text generation.
---
## **Dependencies**
### Python:
- torch, torchvision, torchaudio
- phonemizer
- transformers
- scipy
- munch
- python-dotenv
- openai
- grpcio, grpcio-tools
- espeak
### Node.js:
- Express server dependencies (`npm install` in `backend`).
- React client dependencies (`npm install` in `frontend`).
---
## **Contributing**
Contributions are welcome! Feel free to fork this repository and create a pull request with your improvements.
---
## **Acknowledgments**
- [Hugging Face](https://huggingface.co/) for hosting the Kokoro-82M model.
- The amazing communities behind PyTorch, OpenAI, and Deepgram APIs.