Spaces:

abdullah-khaled
/

ai-voice-secretary

Sleeping

App Files Files Community

ai-voice-secretary / README.md

abdullah-khaled

update README

0fcf895 5 months ago

preview code

raw

history blame

8.76 kB

metadata

title: AI VoiceSecretary
emoji: 📈
colorFrom: yellow
colorTo: green
sdk: docker
pinned: false
license: mit

AI Voice Secretary

The AI Voice Secretary is a sophisticated virtual assistant designed to provide information about Abdullah Khaled's professional profile and GitHub projects. It leverages advanced technologies such as Retrieval-Augmented Generation (RAG), speech-to-text (STT), and text-to-speech (TTS) to deliver a seamless voice and text-based interaction experience. The project features a React-based frontend deployed on Vercel and a FastAPI backend hosted on HuggingFace Spaces, containerized using Docker.

Features

Voice and Text Interaction: Supports both voice and text queries for a versatile user experience.
GitHub Integration: Retrieves and processes READMEs from specified GitHub repositories for context-aware responses.
RAG-Powered Responses: Uses Retrieval-Augmented Generation to provide accurate information based on Abdullah Khaled's profile and projects.
Speech-to-Text (STT): Converts audio input to text using the Whisper model.
Text-to-Speech (TTS): Generates audio responses using the Kokoro-82M model.
Professional Profile Access: Provides details about Abdullah's skills, experience, education, certifications, and contact information.
WebSocket Communication: Enables real-time audio and text interaction between the frontend and backend.
Responsive UI: Built with React and styled with Tailwind CSS for a modern and user-friendly interface.

How It Works

The AI Voice Secretary integrates multiple components to process user queries and deliver responses:

Frontend (React, Vercel):
- Built with React for a dynamic and responsive user interface.
- Uses Tailwind CSS for styling and integrates libraries like date-fns and lucide-react for enhanced functionality.
- Communicates with the backend via WebSocket for real-time audio interactions and HTTP for text queries.
- Deployed on Vercel for reliable hosting and automatic scaling.
Backend (FastAPI, HuggingFace Spaces, Docker):
- Developed using FastAPI for high-performance API and WebSocket endpoints.
- Hosted on HuggingFace Spaces, with deployment managed via a Dockerfile for containerization.
- Uses the Whisper model for STT, converting audio inputs to text.
- Employs the Kokoro-82M model for TTS, generating audio responses.
- Implements RAG by fetching GitHub READMEs, processing them with FAISS and SentenceTransformer embeddings, and generating responses using the Gemini LLM.
- Stores and retrieves GitHub data in a FAISS vector store for efficient similarity searches.
Data Flow:
- Text Queries: Users submit text queries via the React frontend, which are sent to the FastAPI backend. The backend processes the query using RAG and returns a JSON response with relevant information and media links.
- Voice Queries: Audio inputs are sent via WebSocket, transcribed to text using Whisper, processed with RAG, and converted to audio responses using Kokoro-82M. The audio is streamed back to the frontend in segments.
- GitHub Integration: The backend fetches READMEs from specified repositories, splits them into chunks, and indexes them in a FAISS vector store for retrieval during query processing.
- Profile Information: The assistant provides details from Abdullah's professional profile (e.g., skills, contact info) when requested.

Tech Stack

Frontend:
- React 19.0.0
- Tailwind CSS 4.1.4
- Vite 6.3.1 (build tool)
- Libraries: date-fns, lucide-react
- Deployed on Vercel
Backend:
- Python 3.10.9 (FastAPI)
- Libraries: speech_recognition, soundfile, pydub, langchain, sentence-transformers, faiss, torch, requests
- Models: Whisper (STT), Kokoro-82M (TTS), Gemini 2.0 Flash (LLM)
- Containerized with Docker
- Deployed on HuggingFace Spaces
Other:
- GitHub API for fetching READMEs
- FAISS for vector storage and similarity search
- SentenceTransformer (all-MiniLM-L6-v2) for embeddings

Prerequisites

To run this project locally, ensure you have the following:

Node.js: For the React frontend (version 18 or higher recommended).
Python: Version 3.10.9 for the backend.
Docker: For containerized deployment.
GitHub Personal Access Token: For accessing GitHub API (add to .env file).
HuggingFace Account: For deploying the backend on HuggingFace Spaces.
Vercel Account: For deploying the frontend.

Setup Instructions

1. Clone the Repository

git clone https://github.com/abdullah-khaled0/ai-voice-secretary.git
cd ai-voice-secretary

2. Backend Setup

Install Dependencies:

Navigate to the backend directory (e.g., src/backend).

Create a virtual environment and install dependencies:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt
python -m spacy download en_core_web_sm

Set Environment Variables:

Create a .env file in the backend directory with the following:

GITHUB_TOKEN=your_github_personal_access_token
GOOGLE_API_KEY=gemini_api_key_from_google_studio

Run Locally:

Start the FastAPI server:

uvicorn src.backend.voice_assistant:app --host 0.0.0.0 --port 7860

Docker Deployment:

Build the Docker image:
```
docker build -t ai-voice-secretary .
```

Run the container:

docker run -p 7860:7860 --env-file .env ai-voice-secretary

Deploy to HuggingFace Spaces:
- Push the repository to a HuggingFace Space.
- Ensure the Dockerfile and requirements.txt are in the root directory.
- Configure the Space to use the Docker runtime and expose port 7860.

3. Frontend Setup

Install Dependencies:
- Navigate to the frontend directory (e.g., frontend).
- Install Node.js dependencies:
```
npm install
```
Run Locally:
- Start the development server:
```
npm run dev
```
- The frontend will be available at http://localhost:5173.
Build for Production:
- Generate production-ready assets:
```
npm run build
```
Deploy to Vercel:
- Install the Vercel CLI:
```
npm install -g vercel
```
- Deploy the frontend:
```
vercel
```
- Follow the prompts to configure and deploy to Vercel.

4. Connect Frontend and Backend

Update the frontend code to point to the backend URL (e.g., https://your-huggingface-space.hf.space/ws for WebSocket and /text_query for HTTP).
Ensure CORS is configured correctly in the backend to allow requests from the frontend URL (e.g., https://your-vercel-app.vercel.app).

Usage

Text Queries

Access the frontend (e.g., https://your-vercel-app.vercel.app).
Enter a text query (e.g., "Tell me about the Vocaby project" or "What are Abdullah's skills?").
The assistant responds with a JSON object containing:
- response: Details about the project or profile.
- links: Relevant platform links (if requested).
- media_links: Media URLs from GitHub READMEs (if applicable).
- personal_info: Contact details (if requested).

Voice Queries

Use the microphone feature on the frontend to record a query.
The audio is sent to the backend via WebSocket, transcribed, processed, and returned as audio segments.
The frontend plays the audio response and displays the transcribed text and response details.

Example Queries

Project Inquiry: "What is the Film-Trailer-and-Summary-Generator project about?"
Profile Inquiry: "What are Abdullah Khaled's skills?"
Contact Info: "How can I contact Abdullah?"
Platform Links: "What is Abdullah's LinkedIn profile?"

Notes

GitHub Token: Ensure the GITHUB_TOKEN is set in the .env file to avoid rate-limiting issues with the GitHub API.
CORS Configuration: Update the allow_origins in voice_assistant.py to match your frontend's deployed URL.
Audio Processing: The backend processes audio in segments for streaming; ensure a stable WebSocket connection for voice interactions.
HuggingFace Spaces: Monitor resource usage on HuggingFace Spaces, as heavy computations (e.g., LLM inference) may require a paid plan.
Vercel Deployment: Configure Vercel to handle environment variables for the frontend if needed.

License

This project is licensed under the MIT License. See the LICENSE file for details.