You need to agree to share your contact information to access this model
This repository is publicly accessible, but you have to accept the conditions to access its files and content.
You agree to not use the model to generate, share, or promote content that is illegal, harmful, deceptive, or intended to impersonate real individuals without their informed consent.
Log in or Sign Up to review the conditions and access this model content.
EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion
Our paper has been accepted to the Findings of EMNLP 2025!
Installation
Create a separate environment if needed
# Create a python 3.10 conda env (you could also use virtualenv)
conda create -n ez-vc python=3.10
conda activate ez-vc
Local installation
git clone https://github.com/EZ-VC/EZ-VC
cd EZ-VC
git submodule update --init --recursive
pip install -e .
# Install espnet for xeus (Exactly this version)
pip install 'espnet @ git+https://github.com/wanchichen/espnet.git@ssl'
Inference
We have provided a Jupyter notebook for inference in "src/f5_tts/infer/infer.ipynb".
Open Inference notebook.
Run all.
The converted audio will be available at the last cell.
Acknowledgements
- F5-TTS for opensourcing their code which has made EZ-VC possible.
Citation
If our work and codebase is useful for you, please cite as:
@misc{joglekar2025ezvceasyzeroshotanytoany,
title={EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion},
author={Advait Joglekar and Divyanshu Singh and Rooshil Rohit Bhatia and S. Umesh},
year={2025},
eprint={2505.16691},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2505.16691},
}
License
Our code is released under MIT License. The pre-trained models are licensed under the CC-BY-NC license. Sorry for any inconvenience this may cause.
- Downloads last month
- 42