VocRT

File size: 5,599 Bytes

---
license: mit
base_model:
- hexgrad/Kokoro-82M
---
# **VocRT**  
This repository contains the complete codebase for building your personal Realtime Voice-to-Voice (V2V) solution. It integrates a powerful TTS model, gRPC communication, an Express server, and a React-based client. Follow this guide to set up and explore the system effectively.

---

## **Repository Structure**
```
├── backend/         # Express server for handling API requests
├── frontend/        # React client for user interaction
├── .env             # Environment variables (OpenAI API key, etc.)
├── voices           # All available voices
├── demo             # Contains sample audio and demo files
├── other...
```

---

## **Docker**

🐳 VocRT on Docker Hub: https://hub.docker.com/r/anuragsingh922/vocrt


## **Repository**
## **Setup Guide**

### **Step 1: Clone the Repository**
Clone this repository to your local machine:
```bash
git clone https://huggingface.co/anuragsingh922/VocRT
cd VocRT
```

---

### **Step 2: Python Virtual Environment Setup**
Create a virtual environment to manage dependencies:

#### macOS/Linux:
```bash
python3 -m venv venv
source venv/bin/activate
```

#### Windows:
```bash
python -m venv venv
venv\Scripts\activate
```

---

### **Step 3: Install Python Dependencies**
With the virtual environment activated, install the required dependencies:
```bash
pip install --upgrade pip setuptools wheel
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
pip install phonemizer transformers scipy munch python-dotenv openai grpcio grpcio-tools
```

### **Installing eSpeak**
`eSpeak` is a necessary dependency for the VocRT system. Follow the instructions below to install it on your platform:

#### **Ubuntu/Linux**
Use the `apt-get` package manager to install `eSpeak`:
```bash
sudo apt-get update
sudo apt-get install espeak
```

#### **macOS**
Install `eSpeak` using [Homebrew](https://brew.sh/):
1. Ensure Homebrew is installed on your system:
   ```bash
   /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
   ```
2. Install `espeak`:
   ```bash
   brew install espeak
   ```

#### **Windows**
For Windows, follow these steps to install `eSpeak`:
1. Download the eSpeak installer from the official website: [eSpeak Downloads](http://espeak.sourceforge.net/download.html).
2. Run the installer and follow the on-screen instructions to complete the installation.
3. Add the `eSpeak` installation path to your system's `PATH` environment variable:
   - Open **System Properties** → **Advanced** → **Environment Variables**.
   - In the "System Variables" section, find the `Path` variable and edit it.
   - Add the path to the `espeak.exe` file (e.g., `C:\Program Files (x86)\eSpeak`).
4. Verify the installation:
   Open Command Prompt and run:
   ```cmd
   espeak --version
   ```

---

### **Verification**
After installing `eSpeak`, verify it is correctly set up by running:
```bash
espeak "Hello, world!"
```

This should output "Hello, world!" as audio on your system.

---

### **Step 4: Backend Setup (Express Server)**
1. Navigate to the `backend` directory:
   ```bash
   cd backend
   ```
2. Install Node.js dependencies:
   ```bash
   npm install
   ```
3. Update the `config.env` file with your Deepgram API key:
   - Open `config.env` in a text editor.
   - Replace `<deepgram_api_key>` with your actual Deepgram API key.

4. Start the Express server:
   ```bash
   node app.js
   ```

---

### **Step 5: Frontend Setup (React Client)**
1. Open a new terminal and navigate to the `frontend` directory:
   ```bash
   cd frontend
   ```
2. Install client dependencies:
   ```bash
   npm install
   ```
3. Start the client:
   ```bash
   npm start
   ```

---

### **Step 6: Start the VocRT Server**
1. Add your OpenAI API key to the `.env` file:
   - Open `.env` in a text editor.
   - Replace `<openai_api_key>` with your actual OpenAI API key.

2. Start the VocRT server:
   ```bash
   python3 app.py
   ```

---

### **Step 7: Test the Full System**
- Once all servers are running:
  1. Access the React client at [http://localhost:3000](http://localhost:3000).
  2. Interact with the VocRT system via the web interface.

---

## **Model Used**
VocRT uses [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) for text-to-speech synthesis, processing user inputs into high-quality voice responses.

---

## **Key Features**
1. **Realtime voice response generation**: Convert speech input into speech with minimal latency.
2. **React Client**: A user-friendly frontend for interaction.
3. **Express Backend**: Handles API requests and integrates the VocRT system with external services.
4. **gRPC Communication**: Seamless communication between the VocRT server and other components.
5. **Configurable APIs**: Integrates with OpenAI and Deepgram APIs for speech recognition and text generation.

---

## **Dependencies**

### Python:
  - torch, torchvision, torchaudio
  - phonemizer
  - transformers
  - scipy
  - munch
  - python-dotenv
  - openai
  - grpcio, grpcio-tools
  - espeak


### Node.js:
- Express server dependencies (`npm install` in `backend`).
- React client dependencies (`npm install` in `frontend`).

---

## **Contributing**
Contributions are welcome! Feel free to fork this repository and create a pull request with your improvements.

---

## **Acknowledgments**
- [Hugging Face](https://huggingface.co/) for hosting the Kokoro-82M model.
- The amazing communities behind PyTorch, OpenAI, and Deepgram APIs.