|
--- |
|
title: Test |
|
emoji: 🐠 |
|
colorFrom: pink |
|
colorTo: pink |
|
sdk: docker |
|
pinned: false |
|
--- |
|
|
|
|
|
# **AI-Powered Question & Answer Generator with Voice Cloning** |
|
|
|
--- |
|
|
|
## **Overview** |
|
|
|
This project leverages cutting-edge AI technologies to create an interactive experience where AI-generated answers are delivered using a cloned voice. The primary components of the project include: |
|
|
|
1. **Text Generation**: Based on a fine-tuned model, Mistral-7B-v0.1, we generate realistic and human-like answers to user-provided questions. |
|
2. **Voice Cloning**: Using the ElevenLabs API, we clone a voice and synthesize the AI-generated answers into natural-sounding speech. |
|
3. **Deception for Interaction**: The system is designed to "tromper" (mislead) players by making the responses appear as if they are coming from a real human. |
|
|
|
--- |
|
|
|
## **Key Features** |
|
|
|
1. **Fine-Tuned Model for Text Generation**: |
|
- The project utilizes the **Mistral-7B-v0.1** model fine-tuned on a custom dataset. |
|
- The model generates contextually accurate, human-like responses to a wide range of questions. |
|
|
|
2. **Voice Cloning with ElevenLabs**: |
|
- ElevenLabs’ **Speech-to-Text and Voice Cloning API** is used to replicate a target voice. |
|
- The cloned voice delivers the AI-generated answers in a natural and believable manner. |
|
|
|
3. **Integration for Immersion**: |
|
- The generated answers and synthesized speech are integrated to provide seamless interaction. |
|
- Designed for applications in gaming, interactive storytelling, or prank scenarios. |
|
|
|
--- |
|
|
|
## **How It Works** |
|
|
|
### 1. **Question Input**: |
|
- Users provide a question in text form (e.g., "What’s the best way to prepare for a long flight?"). |
|
- Alternatively, voice input can be transcribed to text using ElevenLabs’ speech-to-text feature. |
|
|
|
### 2. **Text Generation**: |
|
- The Mistral-7B-v0.1 model processes the input question and generates a natural response. |
|
- Example: |
|
- **Question**: "What’s your favorite place to relax?" |
|
- **Answer**: "My room, where I can unwind and enjoy some quiet time." |
|
|
|
### 3. **Voice Cloning**: |
|
- The generated text is sent to ElevenLabs’ API, where it is converted into speech using a cloned voice. |
|
- The voice sounds human, complete with natural intonation and emotion. |
|
|
|
### 4. **Output Delivery**: |
|
- The final output is an audio response delivered in the cloned voice, making it indistinguishable from a real human speaker. |
|
|
|
--- |
|
|
|
## **Applications** |
|
|
|
- **Gaming**: Use in trivia or role-playing games to simulate human-like NPCs. |
|
- **Storytelling**: Create immersive audio experiences by combining generated text with realistic voiceovers. |
|
- **Social Experiments**: Test human reactions to AI-generated, voice-synthesized responses in various scenarios. |
|
- **Entertainment/Pranks**: Surprise players or audiences with a system that convincingly mimics a real human. |
|
|
|
--- |
|
|
|
## **Technologies Used** |
|
|
|
1. **Mistral-7B-v0.1**: |
|
- A fine-tuned large language model specializing in text generation. |
|
- Delivers contextually accurate and relatable answers. |
|
|
|
2. **ElevenLabs API**: |
|
- **Speech-to-Text**: Converts spoken questions into text for the model to process. |
|
- **Voice Cloning**: Synthesizes text into speech using a cloned voice. |
|
|
|
3. **Python**: |
|
- Backend logic for integrating text generation, voice synthesis, and API calls. |
|
- Frameworks and libraries include `transformers`, `torch`, and API wrappers for ElevenLabs. |
|
|
|
--- |
|
|
|
## **Setup Instructions** |
|
|
|
### 1. **Clone the Repository**: |
|
```bash |
|
git clone https://github.com/Lirone/NotMe.git |
|
cd NotMe |
|
|