Audio-to-Audio
F5-TTS

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

You agree to not use the model to generate, share, or promote content that is illegal, harmful, deceptive, or intended to impersonate real individuals without their informed consent.

Log in or Sign Up to review the conditions and access this model content.

EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion

github arXiv demo lab

Our paper has been accepted to the Findings of EMNLP 2025!

Installation

Create a separate environment if needed

# Create a python 3.10 conda env (you could also use virtualenv)
conda create -n ez-vc python=3.10
conda activate ez-vc

Local installation

git clone https://github.com/EZ-VC/EZ-VC
cd EZ-VC
git submodule update --init --recursive
pip install -e .

# Install espnet for xeus (Exactly this version)
pip install 'espnet @ git+https://github.com/wanchichen/espnet.git@ssl'

Inference

We have provided a Jupyter notebook for inference in "src/f5_tts/infer/infer.ipynb".

Open Inference notebook.

Run all.

The converted audio will be available at the last cell.

Acknowledgements

  • F5-TTS for opensourcing their code which has made EZ-VC possible.

Citation

If our work and codebase is useful for you, please cite as:

@misc{joglekar2025ezvceasyzeroshotanytoany,
      title={EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion}, 
      author={Advait Joglekar and Divyanshu Singh and Rooshil Rohit Bhatia and S. Umesh},
      year={2025},
      eprint={2505.16691},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2505.16691}, 
}

License

Our code is released under MIT License. The pre-trained models are licensed under the CC-BY-NC license. Sorry for any inconvenience this may cause.

Downloads last month
42
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support