EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion

Our paper has been accepted to the Findings of EMNLP 2025!

Installation

Create a separate environment if needed

# Create a python 3.10 conda env (you could also use virtualenv)
conda create -n ez-vc python=3.10
conda activate ez-vc

Local installation

git clone https://github.com/EZ-VC/EZ-VC
cd EZ-VC
git submodule update --init --recursive
pip install -e .

# Install espnet for xeus (Exactly this version)
pip install 'espnet @ git+https://github.com/wanchichen/espnet.git@ssl'

Inference

We have provided a Jupyter notebook for inference in "src/f5_tts/infer/infer.ipynb".

Open Inference notebook.

Run all.

The converted audio will be available at the last cell.

Acknowledgements

F5-TTS for opensourcing their code which has made EZ-VC possible.

Citation

If our work and codebase is useful for you, please cite as:

@misc{joglekar2025ezvceasyzeroshotanytoany,
      title={EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion}, 
      author={Advait Joglekar and Divyanshu Singh and Rooshil Rohit Bhatia and S. Umesh},
      year={2025},
      eprint={2505.16691},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2505.16691}, 
}

License

Our code is released under MIT License. The pre-trained models are licensed under the CC-BY-NC license. Sorry for any inconvenience this may cause.

SPRINGLab
/

EZ-VC

You need to agree to share your contact information to access this model