File size: 3,556 Bytes
b93dec3 924c34a 6f769e0 51ce635 d9750a0 e2d1418 d9750a0 9ffc9d1 d9750a0 070fc86 6d4a6be 3d9aeca 6d4a6be 070fc86 5d184ce e6f117c 5d184ce |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
---
license: mit
language:
- en
- pt
- es
- zh
- nl
- fr
- de
- it
- ja
- pl
pipeline_tag: audio-to-audio
tags:
- audio
- voice
- voice conversion
- singing voice conversion
- vc
- svc
- multilingual
---
# FreeSVC: Zero-shot Multilingual Singing Voice Conversion
**FreeSVC** is a promising multilingual zero-shot singing voice conversion model. It enables the conversion of singing voices across languages without the need for extensive language-specific training. [GitHub repository](https://github.com/freds0/free-svc). [Paper arXiv pre-print](https://arxiv.org/abs/2501.05586).
## Supported Languages
| Language | ID | Status | Speech Data | Singing Data |
|------------|-----|--------------|-------------|--------------|
| Chinese | 0 | ✅ Full | 255h | 70h |
| Dutch | 1 | ✅ Full | Part of CML | - |
| English | 2 | ✅ Full | 921h | 47h |
| French | 3 | ✅ Full | Part of CML | - |
| German | 4 | ✅ Full | Part of CML | - |
| Italian | 5 | ✅ Full | Part of CML | - |
| Japanese | 6 | ✅ Full | 30h | - |
| Other* | 7 | ⚠️ Partial | - | 10h |
| Polish | 8 | ✅ Full | Part of CML | - |
| Portuguese | 9 | ✅ Full | Part of CML | - |
| Spanish | 10 | ✅ Full | Part of CML | - |
*Note: The "Other" category is used for vocal techniques without content.
## Model Overview
FreeSVC leverages an enhanced VITS architecture integrated with Speaker-invariant Clustering (SPIN) and the ECAPA2 speaker encoder. This combination effectively separates speaker characteristics from linguistic content, ensuring high-quality and natural-sounding voice conversions across multiple languages.
## Training Datasets
FreeSVC was trained on a diverse set of speech and singing datasets covering multiple languages:
| **Dataset** | **Hours** | **Language** | **Type** |
|----------------------|------------|--------------|--------------|
| AISHELL-1 | 170h | Chinese | Speech |
| AISHELL-3 | 85h | Chinese | Speech |
| CML-TTS | 3.1k | 7 Languages | Speech |
| HiFiTTS | 292h | English | Speech |
| JVS | 30h | Japanese | Speech |
| LibriTTS-R | 585h | English | Speech |
| NUS (NHSS) | 7h | English | Speech, Singing |
| OpenSinger | 50h | Chinese | Singing |
| Opencpop | 5h | Chinese | Singing |
| PopBuTFy | 10h, 40h | Chinese, English | Singing |
| POPCS | 5h | Chinese | Singing |
| VCTK | 44h | English | Speech |
| VocalSet | 10h | Other | Singing |
## Citation
```
@misc{ferreira2025freesvczeroshotmultilingualsinging,
title={FreeSVC: Towards Zero-shot Multilingual Singing Voice Conversion},
author={Alef Iury Siqueira Ferreira and Lucas Rafael Gris and Augusto Seben da Rosa and Frederico Santos de Oliveira and Edresson Casanova and Rafael Teixeira Sousa and Arnaldo Candido Junior and Anderson da Silva Soares and Arlindo Galvão Filho},
year={2025},
eprint={2501.05586},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2501.05586},
}
``` |