|
--- |
|
license: mit |
|
base_model: |
|
- hexgrad/Kokoro-82M |
|
--- |
|
# **VocRT** |
|
This repository contains the complete codebase for building your personal Realtime Voice-to-Voice (V2V) solution. It integrates a powerful TTS model, gRPC communication, an Express server, and a React-based client. Follow this guide to set up and explore the system effectively. |
|
|
|
--- |
|
|
|
## **Repository Structure** |
|
``` |
|
βββ backend/ # Express server for handling API requests |
|
βββ frontend/ # React client for user interaction |
|
βββ .env # Environment variables (OpenAI API key, etc.) |
|
βββ voices # All available voices |
|
βββ demo # Contains sample audio and demo files |
|
βββ other... |
|
``` |
|
|
|
--- |
|
|
|
## **Docker** |
|
|
|
π³ VocRT on Docker Hub: https://hub.docker.com/r/anuragsingh922/vocrt |
|
|
|
|
|
## **Repository** |
|
## **Setup Guide** |
|
|
|
### **Step 1: Clone the Repository** |
|
Clone this repository to your local machine: |
|
```bash |
|
git clone https://huggingface.co/anuragsingh922/VocRT |
|
cd VocRT |
|
``` |
|
|
|
--- |
|
|
|
### **Step 2: Python Virtual Environment Setup** |
|
Create a virtual environment to manage dependencies: |
|
|
|
#### macOS/Linux: |
|
```bash |
|
python3 -m venv venv |
|
source venv/bin/activate |
|
``` |
|
|
|
#### Windows: |
|
```bash |
|
python -m venv venv |
|
venv\Scripts\activate |
|
``` |
|
|
|
--- |
|
|
|
### **Step 3: Install Python Dependencies** |
|
With the virtual environment activated, install the required dependencies: |
|
```bash |
|
pip install --upgrade pip setuptools wheel |
|
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu |
|
pip install phonemizer transformers scipy munch python-dotenv openai grpcio grpcio-tools |
|
``` |
|
|
|
### **Installing eSpeak** |
|
`eSpeak` is a necessary dependency for the VocRT system. Follow the instructions below to install it on your platform: |
|
|
|
#### **Ubuntu/Linux** |
|
Use the `apt-get` package manager to install `eSpeak`: |
|
```bash |
|
sudo apt-get update |
|
sudo apt-get install espeak |
|
``` |
|
|
|
#### **macOS** |
|
Install `eSpeak` using [Homebrew](https://brew.sh/): |
|
1. Ensure Homebrew is installed on your system: |
|
```bash |
|
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" |
|
``` |
|
2. Install `espeak`: |
|
```bash |
|
brew install espeak |
|
``` |
|
|
|
#### **Windows** |
|
For Windows, follow these steps to install `eSpeak`: |
|
1. Download the eSpeak installer from the official website: [eSpeak Downloads](http://espeak.sourceforge.net/download.html). |
|
2. Run the installer and follow the on-screen instructions to complete the installation. |
|
3. Add the `eSpeak` installation path to your system's `PATH` environment variable: |
|
- Open **System Properties** β **Advanced** β **Environment Variables**. |
|
- In the "System Variables" section, find the `Path` variable and edit it. |
|
- Add the path to the `espeak.exe` file (e.g., `C:\Program Files (x86)\eSpeak`). |
|
4. Verify the installation: |
|
Open Command Prompt and run: |
|
```cmd |
|
espeak --version |
|
``` |
|
|
|
--- |
|
|
|
### **Verification** |
|
After installing `eSpeak`, verify it is correctly set up by running: |
|
```bash |
|
espeak "Hello, world!" |
|
``` |
|
|
|
This should output "Hello, world!" as audio on your system. |
|
|
|
--- |
|
|
|
### **Step 4: Backend Setup (Express Server)** |
|
1. Navigate to the `backend` directory: |
|
```bash |
|
cd backend |
|
``` |
|
2. Install Node.js dependencies: |
|
```bash |
|
npm install |
|
``` |
|
3. Update the `config.env` file with your Deepgram API key: |
|
- Open `config.env` in a text editor. |
|
- Replace `<deepgram_api_key>` with your actual Deepgram API key. |
|
|
|
4. Start the Express server: |
|
```bash |
|
node app.js |
|
``` |
|
|
|
--- |
|
|
|
### **Step 5: Frontend Setup (React Client)** |
|
1. Open a new terminal and navigate to the `frontend` directory: |
|
```bash |
|
cd frontend |
|
``` |
|
2. Install client dependencies: |
|
```bash |
|
npm install |
|
``` |
|
3. Start the client: |
|
```bash |
|
npm start |
|
``` |
|
|
|
--- |
|
|
|
### **Step 6: Start the VocRT Server** |
|
1. Add your OpenAI API key to the `.env` file: |
|
- Open `.env` in a text editor. |
|
- Replace `<openai_api_key>` with your actual OpenAI API key. |
|
|
|
2. Start the VocRT server: |
|
```bash |
|
python3 app.py |
|
``` |
|
|
|
--- |
|
|
|
### **Step 7: Test the Full System** |
|
- Once all servers are running: |
|
1. Access the React client at [http://localhost:3000](http://localhost:3000). |
|
2. Interact with the VocRT system via the web interface. |
|
|
|
--- |
|
|
|
## **Model Used** |
|
VocRT uses [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) for text-to-speech synthesis, processing user inputs into high-quality voice responses. |
|
|
|
--- |
|
|
|
## **Key Features** |
|
1. **Realtime voice response generation**: Convert speech input into speech with minimal latency. |
|
2. **React Client**: A user-friendly frontend for interaction. |
|
3. **Express Backend**: Handles API requests and integrates the VocRT system with external services. |
|
4. **gRPC Communication**: Seamless communication between the VocRT server and other components. |
|
5. **Configurable APIs**: Integrates with OpenAI and Deepgram APIs for speech recognition and text generation. |
|
|
|
--- |
|
|
|
## **Dependencies** |
|
|
|
### Python: |
|
- torch, torchvision, torchaudio |
|
- phonemizer |
|
- transformers |
|
- scipy |
|
- munch |
|
- python-dotenv |
|
- openai |
|
- grpcio, grpcio-tools |
|
- espeak |
|
|
|
|
|
### Node.js: |
|
- Express server dependencies (`npm install` in `backend`). |
|
- React client dependencies (`npm install` in `frontend`). |
|
|
|
--- |
|
|
|
## **Contributing** |
|
Contributions are welcome! Feel free to fork this repository and create a pull request with your improvements. |
|
|
|
--- |
|
|
|
## **Acknowledgments** |
|
- [Hugging Face](https://huggingface.co/) for hosting the Kokoro-82M model. |
|
- The amazing communities behind PyTorch, OpenAI, and Deepgram APIs. |