microsoft
/

VibeVoice-1.5B

text-generation

Model card Files Files and versions

hululuhu commited on 9 days ago

Commit

4f04137

·

1 Parent(s): 142f4a5

update README

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -10,10 +10,11 @@ A core innovation of VibeVoice is its use of continuous speech tokenizers (Acous
 The model can synthesize speech up to **90 minutes** long with up to **4 distinct speakers**, surpassing the typical 1-2 speaker limits of many prior models.
-➡️ **Project Demo:** [microsoft/VibeVoice-Demo](https://microsoft.github.io/VibeVoice)
-➡️ **Github Code:** [microsoft/VibeVoice-Code](https://github.com/microsoft/VibeVoice)
 ## Training details
 Transformer-based Large Language Model (LLM) integrated with specialized acoustic and semantic tokenizers and a diffusion-based decoding head.

 The model can synthesize speech up to **90 minutes** long with up to **4 distinct speakers**, surpassing the typical 1-2 speaker limits of many prior models.
+➡️ **Technical Report:** [VibeVoice Technical Report](https://github.com/microsoft/VibeVoice/blob/main/report/TechnicalReport.pdf)
+➡️ **Project Page:** [microsoft/VibeVoice](https://microsoft.github.io/VibeVoice)
+➡️ **Code:** [microsoft/VibeVoice-Code](https://github.com/microsoft/VibeVoice)
 ## Training details
 Transformer-based Large Language Model (LLM) integrated with specialized acoustic and semantic tokenizers and a diffusion-based decoding head.