File size: 3,084 Bytes
b93dec3 924c34a 6f769e0 51ce635 d9750a0 722bc91 d9750a0 9ffc9d1 d9750a0 070fc86 6d4a6be 3d9aeca 6d4a6be 070fc86 d9750a0 5d184ce |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
---
license: mit
language:
- en
- pt
- es
- zh
- nl
- fr
- de
- it
- ja
- pl
pipeline_tag: audio-to-audio
tags:
- audio
- voice
- voice conversion
- singing voice conversion
- vc
- svc
- multilingual
---
# FreeSVC: Zero-shot Multilingual Singing Voice Conversion
**FreeSVC** is a state-of-the-art multilingual singing voice conversion model designed for zero-shot learning. It enables the conversion of singing voices across various languages without the need for extensive language-specific training. [GitHub repository](https://github.com/freds0/free-svc).
## Supported Languages
| Language | ID | Status | Speech Data | Singing Data |
|------------|-----|--------------|-------------|--------------|
| Chinese | 0 | β
Full | 255h | 70h |
| Dutch | 1 | β
Full | Part of CML | - |
| English | 2 | β
Full | 921h | 47h |
| French | 3 | β
Full | Part of CML | - |
| German | 4 | β
Full | Part of CML | - |
| Italian | 5 | β
Full | Part of CML | - |
| Japanese | 6 | β
Full | 30h | - |
| Other* | 7 | β οΈ Partial | - | 10h |
| Polish | 8 | β
Full | Part of CML | - |
| Portuguese | 9 | β
Full | Part of CML | - |
| Spanish | 10 | β
Full | Part of CML | - |
*Note: The "Other" category is used for vocal techniques without content.
## Model Overview
FreeSVC leverages an enhanced VITS architecture integrated with Speaker-invariant Clustering (SPIN) and the ECAPA2 speaker encoder. This combination effectively separates speaker characteristics from linguistic content, ensuring high-quality and natural-sounding voice conversions across multiple languages.
## Training Datasets
FreeSVC was trained on a diverse set of speech and singing datasets covering multiple languages:
| **Dataset** | **Hours** | **Language** | **Type** |
|----------------------|------------|--------------|--------------|
| AISHELL-1 | 170h | Chinese | Speech |
| AISHELL-3 | 85h | Chinese | Speech |
| CML-TTS | 3.1k | 7 Languages | Speech |
| HiFiTTS | 292h | English | Speech |
| JVS | 30h | Japanese | Speech |
| LibriTTS-R | 585h | English | Speech |
| NUS (NHSS) | 7h | English | Speech, Singing |
| OpenSinger | 50h | Chinese | Singing |
| Opencpop | 5h | Chinese | Singing |
| PopBuTFy | 10h, 40h | Chinese, English | Singing |
| POPCS | 5h | Chinese | Singing |
| VCTK | 44h | English | Speech |
| VocalSet | 10h | Other | Singing |
## License
This code repository is licensed under [the MIT License](LICENSE-CODE).
## Citation
```
@misc{}
``` |