| # HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis | |
| Based on the script [`train_hifigan.py`](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/hifigan/train_hifigan.py). | |
| ## Training HiFi-GAN from scratch with LJSpeech dataset. | |
| This example code show you how to train MelGAN from scratch with Tensorflow 2 based on custom training loop and tf.function. The data used for this example is LJSpeech, you can download the dataset at [link](https://keithito.com/LJ-Speech-Dataset/). | |
| ### Step 1: Create Tensorflow based Dataloader (tf.dataset) | |
| First, you need define data loader based on AbstractDataset class (see [`abstract_dataset.py`](https://github.com/tensorspeech/TensorFlowTTS/tree/master/tensorflow_tts/datasets/abstract_dataset.py)). On this example, a dataloader read dataset from path. I use suffix to classify what file is a audio and mel-spectrogram (see [`audio_mel_dataset.py`](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/melgan/audio_mel_dataset.py)). If you already have preprocessed version of your target dataset, you don't need to use this example dataloader, you just need refer my dataloader and modify **generator function** to adapt with your case. Normally, a generator function should return [audio, mel]. | |
| ### Step 2: Training from scratch | |
| After you re-define your dataloader, pls modify an input arguments, train_dataset and valid_dataset from [`train_hifigan.py`](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/hifigan/train_hifigan.py). Here is an example command line to training HiFi-GAN from scratch: | |
| First, you need training generator with only stft loss: | |
| ```bash | |
| CUDA_VISIBLE_DEVICES=0 python examples/hifigan/train_hifigan.py \ | |
| --train-dir ./dump/train/ \ | |
| --dev-dir ./dump/valid/ \ | |
| --outdir ./examples/hifigan/exp/train.hifigan.v1/ \ | |
| --config ./examples/hifigan/conf/hifigan.v1.yaml \ | |
| --use-norm 1 | |
| --generator_mixed_precision 1 \ | |
| --resume "" | |
| ``` | |
| Then resume and start training generator + discriminator: | |
| ```bash | |
| CUDA_VISIBLE_DEVICES=0 python examples/hifigan/train_hifigan.py \ | |
| --train-dir ./dump/train/ \ | |
| --dev-dir ./dump/valid/ \ | |
| --outdir ./examples/hifigan/exp/train.hifigan.v1/ \ | |
| --config ./examples/hifigan/conf/hifigan.v1.yaml \ | |
| --use-norm 1 | |
| --resume ./examples/hifigan/exp/train.hifigan.v1/checkpoints/ckpt-100000 | |
| ``` | |
| IF you want to use MultiGPU to training you can replace `CUDA_VISIBLE_DEVICES=0` by `CUDA_VISIBLE_DEVICES=0,1,2,3` for example. You also need to tune the `batch_size` for each GPU (in config file) by yourself to maximize the performance. Note that MultiGPU now support for Training but not yet support for Decode. | |
| In case you want to resume the training progress, please following below example command line: | |
| ```bash | |
| --resume ./examples/hifigan/exp/train.hifigan.v1/checkpoints/ckpt-100000 | |
| ``` | |
| If you want to finetune a model, use `--pretrained` like this with the filename of the generator | |
| ```bash | |
| --pretrained ptgenerator.h5 | |
| ``` | |
| **IMPORTANT NOTES**: | |
| - When training generator only, we enable mixed precision to speed-up training progress. | |
| - We don't apply mixed precision when training both generator and discriminator. (Discriminator include group-convolution, which cause discriminator slower when enable mixed precision). | |
| - 100k here is a *discriminator_train_start_steps* parameters from [hifigan.v1.yaml](https://github.com/tensorspeech/TensorflowTTS/tree/master/examples/hifigan/conf/hifigan.v1.yaml) | |
| ## Reference | |
| 1. https://github.com/descriptinc/melgan-neurips | |
| 2. https://github.com/kan-bayashi/ParallelWaveGAN | |
| 3. https://github.com/tensorflow/addons | |
| 4. [HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis](https://arxiv.org/abs/2010.05646) | |
| 5. [MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis](https://arxiv.org/abs/1910.06711) | |
| 6. [Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram](https://arxiv.org/abs/1910.11480) |