YaoyaoChang commited on
Commit
217a7fe
·
1 Parent(s): 0220f13
Files changed (1) hide show
  1. README.md +12 -1
README.md CHANGED
@@ -39,7 +39,18 @@ Transformer-based Large Language Model (LLM) integrated with specialized acousti
39
  - VibeVoice Training: Pre-trained tokenizers are frozen; only the LLM and diffusion head parameters are trained. A curriculum learning strategy is used for input sequence length (4k -> 16K -> 32K -> 64K). Text tokenizer not explicitly specified, but the LLM (Qwen2.5) typically uses its own. Audio is "tokenized" via the acoustic and semantic tokenizers.
40
 
41
 
42
- ## Uses
 
 
 
 
 
 
 
 
 
 
 
43
  ### Direct intended uses
44
  The VibeVoice model is limited to research purpose use exploring highly realistic audio dialogue generation detailed in the [tech report](https://github.com/microsoft/VibeVoice/blob/main/report/TechnicalReport.pdf).
45
 
 
39
  - VibeVoice Training: Pre-trained tokenizers are frozen; only the LLM and diffusion head parameters are trained. A curriculum learning strategy is used for input sequence length (4k -> 16K -> 32K -> 64K). Text tokenizer not explicitly specified, but the LLM (Qwen2.5) typically uses its own. Audio is "tokenized" via the acoustic and semantic tokenizers.
40
 
41
 
42
+ ## Models
43
+ | Model | Context Length | Generation Length | Weight |
44
+ |-------|----------------|----------|----------|
45
+ | VibeVoice-0.5B-Streaming | - | - | On the way |
46
+ | VibeVoice-1.5B | 64K | ~90 min | You are here. |
47
+ | VibeVoice-7B-Preview| 32K | ~45 min | [HF link](https://huggingface.co/WestZhang/VibeVoice-Large-pt) |
48
+
49
+ ## Installation and Usage
50
+
51
+ Please refer to [GitHub README](https://github.com/microsoft/VibeVoice?tab=readme-ov-file#installation)
52
+
53
+ ## Responsible Usage
54
  ### Direct intended uses
55
  The VibeVoice model is limited to research purpose use exploring highly realistic audio dialogue generation detailed in the [tech report](https://github.com/microsoft/VibeVoice/blob/main/report/TechnicalReport.pdf).
56