VocRT / README.md

Update README.md

a8356b0 verified about 2 months ago

5.6 kB

	---
	license: mit
	base_model:
	- hexgrad/Kokoro-82M
	---
	# VocRT
	This repository contains the complete codebase for building your personal Realtime Voice-to-Voice (V2V) solution. It integrates a powerful TTS model, gRPC communication, an Express server, and a React-based client. Follow this guide to set up and explore the system effectively.

	---

	## Repository Structure
	```
	├── backend/ # Express server for handling API requests
	├── frontend/ # React client for user interaction
	├── .env # Environment variables (OpenAI API key, etc.)
	├── voices # All available voices
	├── demo # Contains sample audio and demo files
	├── other...
	```

	---

	## Docker

	🐳 VocRT on Docker Hub: https://hub.docker.com/r/anuragsingh922/vocrt


	## Repository
	## Setup Guide

	### Step 1: Clone the Repository
	Clone this repository to your local machine:
	```bash
	git clone https://huggingface.co/anuragsingh922/VocRT
	cd VocRT
	```

	---

	### Step 2: Python Virtual Environment Setup
	Create a virtual environment to manage dependencies:

	#### macOS/Linux:
	```bash
	python3 -m venv venv
	source venv/bin/activate
	```

	#### Windows:
	```bash
	python -m venv venv
	venv\Scripts\activate
	```

	---

	### Step 3: Install Python Dependencies
	With the virtual environment activated, install the required dependencies:
	```bash
	pip install --upgrade pip setuptools wheel
	pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
	pip install phonemizer transformers scipy munch python-dotenv openai grpcio grpcio-tools
	```

	### Installing eSpeak
	`eSpeak` is a necessary dependency for the VocRT system. Follow the instructions below to install it on your platform:

	#### Ubuntu/Linux
	Use the `apt-get` package manager to install `eSpeak`:
	```bash
	sudo apt-get update
	sudo apt-get install espeak
	```

	#### macOS
	Install `eSpeak` using [Homebrew](https://brew.sh/):
	1. Ensure Homebrew is installed on your system:
	```bash
	/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
	```
	2. Install `espeak`:
	```bash
	brew install espeak
	```

	#### Windows
	For Windows, follow these steps to install `eSpeak`:
	1. Download the eSpeak installer from the official website: [eSpeak Downloads](http://espeak.sourceforge.net/download.html).
	2. Run the installer and follow the on-screen instructions to complete the installation.
	3. Add the `eSpeak` installation path to your system's `PATH` environment variable:
	- Open System Properties → Advanced → Environment Variables.
	- In the "System Variables" section, find the `Path` variable and edit it.
	- Add the path to the `espeak.exe` file (e.g., `C:\Program Files (x86)\eSpeak`).
	4. Verify the installation:
	Open Command Prompt and run:
	```cmd
	espeak --version
	```

	---

	### Verification
	After installing `eSpeak`, verify it is correctly set up by running:
	```bash
	espeak "Hello, world!"
	```

	This should output "Hello, world!" as audio on your system.

	---

	### Step 4: Backend Setup (Express Server)
	1. Navigate to the `backend` directory:
	```bash
	cd backend
	```
	2. Install Node.js dependencies:
	```bash
	npm install
	```
	3. Update the `config.env` file with your Deepgram API key:
	- Open `config.env` in a text editor.
	- Replace `<deepgram_api_key>` with your actual Deepgram API key.

	4. Start the Express server:
	```bash
	node app.js
	```

	---

	### Step 5: Frontend Setup (React Client)
	1. Open a new terminal and navigate to the `frontend` directory:
	```bash
	cd frontend
	```
	2. Install client dependencies:
	```bash
	npm install
	```
	3. Start the client:
	```bash
	npm start
	```

	---

	### Step 6: Start the VocRT Server
	1. Add your OpenAI API key to the `.env` file:
	- Open `.env` in a text editor.
	- Replace `<openai_api_key>` with your actual OpenAI API key.

	2. Start the VocRT server:
	```bash
	python3 app.py
	```

	---

	### Step 7: Test the Full System
	- Once all servers are running:
	1. Access the React client at [http://localhost:3000](http://localhost:3000).
	2. Interact with the VocRT system via the web interface.

	---

	## Model Used
	VocRT uses [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) for text-to-speech synthesis, processing user inputs into high-quality voice responses.

	---

	## Key Features
	1. Realtime voice response generation: Convert speech input into speech with minimal latency.
	2. React Client: A user-friendly frontend for interaction.
	3. Express Backend: Handles API requests and integrates the VocRT system with external services.
	4. gRPC Communication: Seamless communication between the VocRT server and other components.
	5. Configurable APIs: Integrates with OpenAI and Deepgram APIs for speech recognition and text generation.

	---

	## Dependencies

	### Python:
	- torch, torchvision, torchaudio
	- phonemizer
	- transformers
	- scipy
	- munch
	- python-dotenv
	- openai
	- grpcio, grpcio-tools
	- espeak


	### Node.js:
	- Express server dependencies (`npm install` in `backend`).
	- React client dependencies (`npm install` in `frontend`).

	---

	## Contributing
	Contributions are welcome! Feel free to fork this repository and create a pull request with your improvements.

	---

	## Acknowledgments
	- [Hugging Face](https://huggingface.co/) for hosting the Kokoro-82M model.
	- The amazing communities behind PyTorch, OpenAI, and Deepgram APIs.