|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- voice-conversion |
|
|
- speech-anonymization |
|
|
--- |
|
|
|
|
|
<div align="center"> |
|
|
<img src="figures/logo.png" width="300"/> |
|
|
</div> |
|
|
|
|
|
<p align="center"><strong style="font-size: 22px;"> |
|
|
Self-Supervised LM-Based Zero-Shot Voice Conversion |
|
|
</strong> |
|
|
</p> |
|
|
|
|
|
<p align="center"> |
|
|
♠︎ <a href="https://huggingface.co/ZexinCai/GenVC">Model</a> | ♣︎ <a href="https://github.com/caizexin/GenVC">Github</a> |
|
|
| ♥︎ <a href="https://arxiv.org/abs/2502.04519">Paper</a> | ♦︎ <a href="https://caizexin.github.io/GenVC/index.html">Demo</a> |
|
|
</p> |
|
|
|
|
|
GenVC is an open-source, language model-based zero-shot voice conversion system that leverages self-supervised training and supports streaming voice conversion. **This model card hosts the model checkpoints. For more details on inference and training, please refer to our [GitHub](https://github.com/caizexin/GenVC) repository.** |
|
|
<!-- --- --> |
|
|
|
|
|
## Approach |
|
|
<p align="center"> |
|
|
<img src="figures/genVC.png" width="100%"/> |
|
|
</p> |
|
|
|
|
|
|
|
|
## Features |
|
|
|
|
|
✅ **Zero-shot Voice Conversion** |
|
|
|
|
|
✅ **Streaming VC** |
|
|
|
|
|
✅ **Self-supervised Training** |
|
|
|