File size: 3,175 Bytes
17ed7d8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
# 🐢 Tortoise
Tortoise is a very expressive TTS system with impressive voice cloning capabilities. It is based on an GPT like autogressive acoustic model that converts input
text to discritized acoustic tokens, a diffusion model that converts these tokens to melspectrogram frames and a Univnet vocoder to convert the spectrograms to
the final audio signal. The important downside is that Tortoise is very slow compared to the parallel TTS models like VITS.

Big thanks to 👑[@manmay-nakhashi](https://github.com/manmay-nakhashi) who helped us implement Tortoise in 🐸TTS.

Example use:

```python

from TTS.tts.configs.tortoise_config import TortoiseConfig

from TTS.tts.models.tortoise import Tortoise



config = TortoiseConfig()

model = Tortoise.init_from_config(config)

model.load_checkpoint(config, checkpoint_dir="paths/to/models_dir/", eval=True)



# with random speaker

output_dict = model.synthesize(text, config, speaker_id="random", extra_voice_dirs=None, **kwargs)



# cloning a speaker

output_dict = model.synthesize(text, config, speaker_id="speaker_n", extra_voice_dirs="path/to/speaker_n/", **kwargs)

```

Using 🐸TTS API:

```python

from TTS.api import TTS

tts = TTS("tts_models/en/multi-dataset/tortoise-v2")



# cloning `lj` voice from `TTS/tts/utils/assets/tortoise/voices/lj`

# with custom inference settings overriding defaults.

tts.tts_to_file(text="Hello, my name is Manmay , how are you?",

                file_path="output.wav",

                voice_dir="path/to/tortoise/voices/dir/",

                speaker="lj",

                num_autoregressive_samples=1,

                diffusion_iterations=10)



# Using presets with the same voice

tts.tts_to_file(text="Hello, my name is Manmay , how are you?",

                file_path="output.wav",

                voice_dir="path/to/tortoise/voices/dir/",

                speaker="lj",

                preset="ultra_fast")



# Random voice generation

tts.tts_to_file(text="Hello, my name is Manmay , how are you?",

                file_path="output.wav")

```

Using 🐸TTS Command line:

```console

# cloning the `lj` voice

tts --model_name  tts_models/en/multi-dataset/tortoise-v2 \

--text "This is an example." \

--out_path "output.wav" \

--voice_dir path/to/tortoise/voices/dir/ \

--speaker_idx "lj" \

--progress_bar True



# Random voice generation

tts --model_name  tts_models/en/multi-dataset/tortoise-v2 \

--text "This is an example." \

--out_path "output.wav" \

--progress_bar True

```


## Important resources & papers
- Original Repo: https://github.com/neonbjb/tortoise-tts
- Faster implementation: https://github.com/152334H/tortoise-tts-fast
- Univnet: https://arxiv.org/abs/2106.07889
- Latent Diffusion:https://arxiv.org/abs/2112.10752
- DALL-E: https://arxiv.org/abs/2102.12092

## TortoiseConfig
```{eval-rst}

.. autoclass:: TTS.tts.configs.tortoise_config.TortoiseConfig

    :members:

```

## TortoiseArgs
```{eval-rst}

.. autoclass:: TTS.tts.models.tortoise.TortoiseArgs

    :members:

```

## Tortoise Model
```{eval-rst}

.. autoclass:: TTS.tts.models.tortoise.Tortoise

    :members:

```