alefiury
/

free-svc

voice conversion

singing voice conversion

Model card Files Files and versions Community

free-svc / README.md

alefiury's picture

Update README.md

5d184ce verified about 2 months ago

|

3.08 kB

	---
	license: mit
	language:
	- en
	- pt
	- es
	- zh
	- nl
	- fr
	- de
	- it
	- ja
	- pl
	pipeline_tag: audio-to-audio
	tags:
	- audio
	- voice
	- voice conversion
	- singing voice conversion
	- vc
	- svc
	- multilingual
	---

	# FreeSVC: Zero-shot Multilingual Singing Voice Conversion

	FreeSVC is a state-of-the-art multilingual singing voice conversion model designed for zero-shot learning. It enables the conversion of singing voices across various languages without the need for extensive language-specific training. [GitHub repository](https://github.com/freds0/free-svc).

	## Supported Languages

	\| Language \| ID \| Status \| Speech Data \| Singing Data \|
	\|------------\|-----\|--------------\|-------------\|--------------\|
	\| Chinese \| 0 \| ✅ Full \| 255h \| 70h \|
	\| Dutch \| 1 \| ✅ Full \| Part of CML \| - \|
	\| English \| 2 \| ✅ Full \| 921h \| 47h \|
	\| French \| 3 \| ✅ Full \| Part of CML \| - \|
	\| German \| 4 \| ✅ Full \| Part of CML \| - \|
	\| Italian \| 5 \| ✅ Full \| Part of CML \| - \|
	\| Japanese \| 6 \| ✅ Full \| 30h \| - \|
	\| Other* \| 7 \| ⚠️ Partial \| - \| 10h \|
	\| Polish \| 8 \| ✅ Full \| Part of CML \| - \|
	\| Portuguese \| 9 \| ✅ Full \| Part of CML \| - \|
	\| Spanish \| 10 \| ✅ Full \| Part of CML \| - \|

	*Note: The "Other" category is used for vocal techniques without content.

	## Model Overview
	FreeSVC leverages an enhanced VITS architecture integrated with Speaker-invariant Clustering (SPIN) and the ECAPA2 speaker encoder. This combination effectively separates speaker characteristics from linguistic content, ensuring high-quality and natural-sounding voice conversions across multiple languages.

	## Training Datasets

	FreeSVC was trained on a diverse set of speech and singing datasets covering multiple languages:

	\| Dataset \| Hours \| Language \| Type \|
	\|----------------------\|------------\|--------------\|--------------\|
	\| AISHELL-1 \| 170h \| Chinese \| Speech \|
	\| AISHELL-3 \| 85h \| Chinese \| Speech \|
	\| CML-TTS \| 3.1k \| 7 Languages \| Speech \|
	\| HiFiTTS \| 292h \| English \| Speech \|
	\| JVS \| 30h \| Japanese \| Speech \|
	\| LibriTTS-R \| 585h \| English \| Speech \|
	\| NUS (NHSS) \| 7h \| English \| Speech, Singing \|
	\| OpenSinger \| 50h \| Chinese \| Singing \|
	\| Opencpop \| 5h \| Chinese \| Singing \|
	\| PopBuTFy \| 10h, 40h \| Chinese, English \| Singing \|
	\| POPCS \| 5h \| Chinese \| Singing \|
	\| VCTK \| 44h \| English \| Speech \|
	\| VocalSet \| 10h \| Other \| Singing \|

	## License
	This code repository is licensed under [the MIT License](LICENSE-CODE).

	## Citation
	```
	@misc{}
	```