|
--- |
|
license: mit |
|
language: |
|
- en |
|
- pt |
|
- es |
|
- zh |
|
- nl |
|
- fr |
|
- de |
|
- it |
|
- ja |
|
- pl |
|
pipeline_tag: audio-to-audio |
|
tags: |
|
- audio |
|
- voice |
|
- voice conversion |
|
- singing voice conversion |
|
- vc |
|
- svc |
|
- multilingual |
|
--- |
|
|
|
# FreeSVC: Zero-shot Multilingual Singing Voice Conversion |
|
|
|
**FreeSVC** is a state-of-the-art multilingual singing voice conversion model designed for zero-shot learning. It enables the conversion of singing voices across various languages without the need for extensive language-specific training. [GitHub repository](https://github.com/freds0/free-svc). |
|
|
|
## Supported Languages |
|
|
|
| Language | ID | Status | Speech Data | Singing Data | |
|
|------------|-----|--------------|-------------|--------------| |
|
| Chinese | 0 | β
Full | 255h | 70h | |
|
| Dutch | 1 | β
Full | Part of CML | - | |
|
| English | 2 | β
Full | 921h | 47h | |
|
| French | 3 | β
Full | Part of CML | - | |
|
| German | 4 | β
Full | Part of CML | - | |
|
| Italian | 5 | β
Full | Part of CML | - | |
|
| Japanese | 6 | β
Full | 30h | - | |
|
| Other* | 7 | β οΈ Partial | - | 10h | |
|
| Polish | 8 | β
Full | Part of CML | - | |
|
| Portuguese | 9 | β
Full | Part of CML | - | |
|
| Spanish | 10 | β
Full | Part of CML | - | |
|
|
|
*Note: The "Other" category is used for vocal techniques without content. |
|
|
|
## Model Overview |
|
FreeSVC leverages an enhanced VITS architecture integrated with Speaker-invariant Clustering (SPIN) and the ECAPA2 speaker encoder. This combination effectively separates speaker characteristics from linguistic content, ensuring high-quality and natural-sounding voice conversions across multiple languages. |
|
|
|
## Training Datasets |
|
|
|
FreeSVC was trained on a diverse set of speech and singing datasets covering multiple languages: |
|
|
|
| **Dataset** | **Hours** | **Language** | **Type** | |
|
|----------------------|------------|--------------|--------------| |
|
| AISHELL-1 | 170h | Chinese | Speech | |
|
| AISHELL-3 | 85h | Chinese | Speech | |
|
| CML-TTS | 3.1k | 7 Languages | Speech | |
|
| HiFiTTS | 292h | English | Speech | |
|
| JVS | 30h | Japanese | Speech | |
|
| LibriTTS-R | 585h | English | Speech | |
|
| NUS (NHSS) | 7h | English | Speech, Singing | |
|
| OpenSinger | 50h | Chinese | Singing | |
|
| Opencpop | 5h | Chinese | Singing | |
|
| PopBuTFy | 10h, 40h | Chinese, English | Singing | |
|
| POPCS | 5h | Chinese | Singing | |
|
| VCTK | 44h | English | Speech | |
|
| VocalSet | 10h | Other | Singing | |
|
|
|
## License |
|
This code repository is licensed under [the MIT License](LICENSE-CODE). |
|
|
|
## Citation |
|
``` |
|
@misc{} |
|
``` |