File size: 7,002 Bytes
17ed7d8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
(synthesizing_speech)=

# Synthesizing Speech



First, you need to install TTS. We recommend using PyPi. You need to call the command below:



```bash

$ pip install TTS

```



After the installation, 2 terminal commands are available.



1. TTS Command Line Interface (CLI). - `tts`

2. Local Demo Server. - `tts-server`

3. In 🐍Python. - `from TTS.api import TTS`



## On the Commandline - `tts`

![cli.gif](https://github.com/coqui-ai/TTS/raw/main/images/tts_cli.gif)



After the installation, 🐸TTS provides a CLI interface for synthesizing speech using pre-trained models. You can either use your own model or the release models under 🐸TTS.



Listing released 🐸TTS models.



```bash

tts --list_models
```



Run a TTS model, from the release models list, with its default vocoder. (Simply copy and paste the full model names from the list as arguments for the command below.)



```bash

tts --text "Text for TTS" \

    --model_name "<type>/<language>/<dataset>/<model_name>" \

    --out_path folder/to/save/output.wav

```

Run a tts and a vocoder model from the released model list. Note that not every vocoder is compatible with every TTS model.

```bash

tts --text "Text for TTS" \

    --model_name "tts_models/<language>/<dataset>/<model_name>" \

    --vocoder_name "vocoder_models/<language>/<dataset>/<model_name>" \

    --out_path folder/to/save/output.wav

```

Run your own TTS model (Using Griffin-Lim Vocoder)

```bash

tts --text "Text for TTS" \

    --model_path path/to/model.pth \

    --config_path path/to/config.json \

    --out_path folder/to/save/output.wav

```

Run your own TTS and Vocoder models

```bash

tts --text "Text for TTS" \

    --config_path path/to/config.json \

    --model_path path/to/model.pth \

    --out_path folder/to/save/output.wav \

    --vocoder_path path/to/vocoder.pth \

    --vocoder_config_path path/to/vocoder_config.json

```

Run a multi-speaker TTS model from the released models list.

```bash

tts --model_name "tts_models/<language>/<dataset>/<model_name>"  --list_speaker_idxs  # list the possible speaker IDs.

tts --text "Text for TTS." --out_path output/path/speech.wav --model_name "tts_models/<language>/<dataset>/<model_name>"  --speaker_idx "<speaker_id>"

```

Run a released voice conversion model

```bash

tts --model_name "voice_conversion/<language>/<dataset>/<model_name>"

    --source_wav "my/source/speaker/audio.wav"

    --target_wav "my/target/speaker/audio.wav"

    --out_path folder/to/save/output.wav

```

**Note:** You can use ```./TTS/bin/synthesize.py``` if you prefer running ```tts``` from the TTS project folder.

## On the Demo Server - `tts-server`

 <!-- <img src="https://raw.githubusercontent.com/coqui-ai/TTS/main/images/demo_server.gif" height="56"/> -->
![server.gif](https://github.com/coqui-ai/TTS/raw/main/images/demo_server.gif)

You can boot up a demo 🐸TTS server to run an inference with your models. Note that the server is not optimized for performance
but gives you an easy way to interact with the models.

The demo server provides pretty much the same interface as the CLI command.

```bash

tts-server -h # see the help

tts-server --list_models  # list the available models.

```

Run a TTS model, from the release models list, with its default vocoder.
If the model you choose is a multi-speaker TTS model, you can select different speakers on the Web interface and synthesize
speech.

```bash

tts-server --model_name "<type>/<language>/<dataset>/<model_name>"

```

Run a TTS and a vocoder model from the released model list. Note that not every vocoder is compatible with every TTS model.

```bash

tts-server --model_name "<type>/<language>/<dataset>/<model_name>" \

           --vocoder_name "<type>/<language>/<dataset>/<model_name>"

```

## Python 🐸TTS API

You can run a multi-speaker and multi-lingual model in Python as

```python

import torch

from TTS.api import TTS



# Get device

device = "cuda" if torch.cuda.is_available() else "cpu"



# List available 🐸TTS models

print(TTS().list_models())



# Init TTS

tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(device)



# Run TTS

# ❗ Since this model is multi-lingual voice cloning model, we must set the target speaker_wav and language

# Text to speech list of amplitude values as output

wav = tts.tts(text="Hello world!", speaker_wav="my/cloning/audio.wav", language="en")

# Text to speech to a file

tts.tts_to_file(text="Hello world!", speaker_wav="my/cloning/audio.wav", language="en", file_path="output.wav")

```

#### Here is an example for a single speaker model.

```python

# Init TTS with the target model name

tts = TTS(model_name="tts_models/de/thorsten/tacotron2-DDC", progress_bar=False)

# Run TTS

tts.tts_to_file(text="Ich bin eine Testnachricht.", file_path=OUTPUT_PATH)

```

#### Example voice cloning with YourTTS in English, French and Portuguese:

```python

tts = TTS(model_name="tts_models/multilingual/multi-dataset/your_tts", progress_bar=False).to("cuda")

tts.tts_to_file("This is voice cloning.", speaker_wav="my/cloning/audio.wav", language="en", file_path="output.wav")

tts.tts_to_file("C'est le clonage de la voix.", speaker_wav="my/cloning/audio.wav", language="fr", file_path="output.wav")

tts.tts_to_file("Isso é clonagem de voz.", speaker_wav="my/cloning/audio.wav", language="pt", file_path="output.wav")

```

#### Example voice conversion converting speaker of the `source_wav` to the speaker of the `target_wav`

```python

tts = TTS(model_name="voice_conversion_models/multilingual/vctk/freevc24", progress_bar=False).to("cuda")

tts.voice_conversion_to_file(source_wav="my/source.wav", target_wav="my/target.wav", file_path="output.wav")

```

#### Example voice cloning by a single speaker TTS model combining with the voice conversion model.

This way, you can clone voices by using any model in 🐸TTS.

```python

tts = TTS("tts_models/de/thorsten/tacotron2-DDC")

tts.tts_with_vc_to_file(

    "Wie sage ich auf Italienisch, dass ich dich liebe?",

    speaker_wav="target/speaker.wav",

    file_path="ouptut.wav"

)

```

#### Example text to speech using **Fairseq models in ~1100 languages** 🤯.
For these models use the following name format: `tts_models/<lang-iso_code>/fairseq/vits`.

You can find the list of language ISO codes [here](https://dl.fbaipublicfiles.com/mms/tts/all-tts-languages.html) and learn about the Fairseq models [here](https://github.com/facebookresearch/fairseq/tree/main/examples/mms).

```python

from TTS.api import TTS

api = TTS(model_name="tts_models/eng/fairseq/vits").to("cuda")

api.tts_to_file("This is a test.", file_path="output.wav")



# TTS with on the fly voice conversion

api = TTS("tts_models/deu/fairseq/vits")

api.tts_with_vc_to_file(

    "Wie sage ich auf Italienisch, dass ich dich liebe?",

    speaker_wav="target/speaker.wav",

    file_path="ouptut.wav"

)

```