Spaces:
Runtime error
Runtime error
Update README.md
Browse files
README.md
CHANGED
|
@@ -9,5 +9,20 @@ app_file: app.py
|
|
| 9 |
pinned: false
|
| 10 |
license: mit
|
| 11 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
|
| 13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
pinned: false
|
| 10 |
license: mit
|
| 11 |
---
|
| 12 |
+
1. Libraries and Tools Used:
|
| 13 |
+
- Transformers: Provides the `VitsModel` and `AutoTokenizer`, with the use of `facebook/mms-tts-eng` model, a sophisticated text-to-speech model designed by Facebook.
|
| 14 |
+
- Torch: A companion library for Transformers, essential for processing the data through the speech model.
|
| 15 |
+
- Librosa: A library for audio processing, especially used here for pitch adjustment of the speech.
|
| 16 |
+
- Soundfile: Utilized to save the speech output as an audio file.
|
| 17 |
+
- Tempfile: Creates temporary files for intermediate storage during processing.
|
| 18 |
+
- Gradio: Facilitates the creation of a user-friendly web interface for the text-to-speech application.
|
| 19 |
|
| 20 |
+
2. Pipeline for Text-to-Speech Conversion:
|
| 21 |
+
- Text Input: You begin by typing in the text you want to be converted into speech.
|
| 22 |
+
- Tokenization: `AutoTokenizer` processes this text, preparing it for the speech model.
|
| 23 |
+
- Speech Synthesis: The `facebook/mms-tts-eng` model within the `VitsModel` takes this processed text and generates the spoken words.
|
| 24 |
+
- Pitch Adjustment: 0 Pitch Value: Represents the normal, unaltered pitch of the speech. This is the default state where the voice sounds as it naturally would, without any modifications.
|
| 25 |
+
Negative Pitch Values: When you set the pitch to a negative value, it makes the voice sound higher. This is similar to moving up the notes on a piano, resulting in a higher, perhaps more youthful or feminine tone.
|
| 26 |
+
Positive Pitch Values: Conversely, positive pitch values make the voice sound lower. This is akin to moving down the notes on a piano. A positive pitch shift results in a deeper, more resonant tone, often associated with a more masculine or mature voice.
|
| 27 |
+
- Saving Audio: The speech with the adjusted pitch is saved as an audio file using `Soundfile` and `Tempfile`.
|
| 28 |
+
- Interactive Web Interface: Gradio provides an interface where you input text, adjust the pitch using a slider, and listen to the speech output.
|