microsoft
/

VibeVoice-1.5B

text-generation

Model card Files Files and versions

YaoyaoChang commited on 6 days ago

Commit

217a7fe

·

1 Parent(s): 0220f13

update

Files changed (1) hide show

README.md +12 -1

README.md CHANGED Viewed

@@ -39,7 +39,18 @@ Transformer-based Large Language Model (LLM) integrated with specialized acousti
     - VibeVoice Training: Pre-trained tokenizers are frozen; only the LLM and diffusion head parameters are trained. A curriculum learning strategy is used for input sequence length (4k -> 16K -> 32K -> 64K). Text tokenizer not explicitly specified, but the LLM (Qwen2.5) typically uses its own. Audio is "tokenized" via the acoustic and semantic tokenizers.
-## Uses
 ### Direct intended uses
 The VibeVoice model is limited to research purpose use exploring highly realistic audio dialogue generation detailed in the [tech report](https://github.com/microsoft/VibeVoice/blob/main/report/TechnicalReport.pdf).

     - VibeVoice Training: Pre-trained tokenizers are frozen; only the LLM and diffusion head parameters are trained. A curriculum learning strategy is used for input sequence length (4k -> 16K -> 32K -> 64K). Text tokenizer not explicitly specified, but the LLM (Qwen2.5) typically uses its own. Audio is "tokenized" via the acoustic and semantic tokenizers.
+## Models
+| Model | Context Length | Generation Length |  Weight |
+|-------|----------------|----------|----------|
+| VibeVoice-0.5B-Streaming | - | - | On the way |
+| VibeVoice-1.5B | 64K | ~90 min | You are here. |
+| VibeVoice-7B-Preview| 32K | ~45 min | [HF link](https://huggingface.co/WestZhang/VibeVoice-Large-pt) |
+## Installation and Usage
+Please refer to [GitHub README](https://github.com/microsoft/VibeVoice?tab=readme-ov-file#installation)
+## Responsible Usage
 ### Direct intended uses
 The VibeVoice model is limited to research purpose use exploring highly realistic audio dialogue generation detailed in the [tech report](https://github.com/microsoft/VibeVoice/blob/main/report/TechnicalReport.pdf).