remiai3 commited on
Commit
ad8ff68
ยท
verified ยท
1 Parent(s): c7c74d8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -35
README.md CHANGED
@@ -13,56 +13,62 @@ license: apache-2.0
13
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
14
 
15
 
16
- # ๐ŸŽค Voice Cloning App (OpenVoice + Whisper)
17
 
18
- This Hugging Face Space lets you **clone any voice** by uploading a short sample.
19
- You can either:
20
- 1. **Text โ†’ Speech Cloning**: Type any text and the app will generate speech in the cloned voice.
21
- 2. **Audio โ†’ Audio Cloning**: Upload a content audio, and the app will convert it into the **sample speaker's voice**.
22
 
23
  ---
24
 
25
  ## ๐Ÿš€ Features
26
- - Supports **mp3** and **wav** input (mp3 is auto-converted to wav internally).
27
- - Works on **CPU only** (free Spaces) โ€“ but will run slower compared to GPU.
28
- - Uses **OpenVoice (MyShell)** for voice cloning and **Whisper (OpenAI)** for automatic speech recognition.
 
29
 
30
  ---
31
 
32
- ## ๐Ÿ› ๏ธ How it Works
33
- 1. Upload a **sample voice** (5โ€“10 seconds is enough).
34
- 2. Choose between:
35
- - **Text โ†’ Speech**: Enter text โ†’ AI speaks in the sample voice.
36
- - **Audio โ†’ Audio**: Upload another audio โ†’ AI transcribes it and re-generates in the sample voice.
37
- 3. Download your cloned audio result.
38
 
39
- ---
40
 
41
- ## โš™๏ธ Tech Stack
42
- - [OpenVoice (MyShell)](https://huggingface.co/myshell-ai/OpenVoice) โ€“ high-quality speaker timbre cloning (~80โ€“90% similarity).
43
- - [Whisper-small](https://huggingface.co/openai/whisper-small) โ€“ automatic speech recognition (ASR).
44
- - [Gradio](https://gradio.app/) โ€“ simple and clean web UI.
45
- - [PyDub](https://github.com/jiaaro/pydub) โ€“ for mp3 โ†’ wav conversion.
46
 
47
- ---
 
 
 
 
48
 
49
- ## ๐Ÿ“‚ Project Structure
50
- โ”œโ”€โ”€ app.py # Main Gradio app
51
- โ”œโ”€โ”€ requirements.txt # Dependencies
52
- โ””โ”€โ”€ README.md # This file
 
53
 
54
 
55
- ---
 
 
 
56
 
57
- ## โšก Notes
58
- - CPU Spaces are slower. Expect **30โ€“60 seconds** processing per request.
59
- - For faster generation, enable a **GPU Space**.
60
- - Works best with clean recordings (no background noise).
61
 
62
- ---
 
 
 
 
 
 
 
 
63
 
64
- ## ๐Ÿ™Œ Acknowledgements
65
- - [MyShell.ai](https://myshell.ai) for OpenVoice
66
- - [Coqui.ai](https://coqui.ai) for pioneering open-source TTS
67
- - [OpenAI Whisper](https://github.com/openai/whisper) for ASR
68
 
 
 
 
13
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
14
 
15
 
16
+ # ๐ŸŽค Voice Cloning App (XTTS-v2)
17
 
18
+ This is a Hugging Face Space demo for **voice cloning**.
19
+ Upload a short **sample voice recording** and enter any text โ€” the AI will synthesize speech in the uploaded voice.
20
+
21
+ Powered by **Coqui XTTS-v2**, running fully on CPU (works in free Spaces).
22
 
23
  ---
24
 
25
  ## ๐Ÿš€ Features
26
+ - Clone a voice with only a few seconds of reference audio.
27
+ - Input text โ†’ get speech in the **same cloned voice**.
28
+ - Supports both `.mp3` and `.wav` formats.
29
+ - Runs on CPU (no GPU required).
30
 
31
  ---
32
 
33
+ ## ๐Ÿ›  Installation
 
 
 
 
 
34
 
35
+ Run locally:
36
 
37
+ ```bash
38
+ git clone https://huggingface.co/spaces/your-username/voice-clone-app
39
+ cd voice-clone-app
40
+ pip install -r requirements.txt
41
+ ```
42
 
43
+ ## Requirements
44
+ TTS==0.22.0
45
+ torch
46
+ pydub
47
+ gradio
48
 
49
+ ## โ–ถ๏ธ Usage
50
+ Start the Gradio app:
51
+ `python app.py`
52
+ Then open the browser at:
53
+ ๐Ÿ‘‰ http://127.0.0.1:7860/
54
 
55
 
56
+ ## ๐Ÿ“‚ How it Works
57
+ Upload a sample voice audio (.wav or .mp3).
58
+ Enter the text you want spoken.
59
+ The model clones the sample voice and generates audio output.
60
 
 
 
 
 
61
 
62
+ ## โš ๏ธ Notes
63
+ Voice cloning quality depends on the length and clarity of the sample voice.
64
+ Works best with clean recordings (5โ€“10 seconds or more).
65
+ CPU inference may be slower than GPU.
66
+
67
+
68
+ ## ๐Ÿ”ฎ Future Plans
69
+ Add Audio โ†’ Audio cloning (transcribe + re-synthesize).
70
+ Add multi-language support.
71
 
 
 
 
 
72
 
73
+ ## โœจ Built with
74
+ Coqui TTS + Gradio