YaoyaoChang
commited on
Commit
·
217a7fe
1
Parent(s):
0220f13
update
Browse files
README.md
CHANGED
@@ -39,7 +39,18 @@ Transformer-based Large Language Model (LLM) integrated with specialized acousti
|
|
39 |
- VibeVoice Training: Pre-trained tokenizers are frozen; only the LLM and diffusion head parameters are trained. A curriculum learning strategy is used for input sequence length (4k -> 16K -> 32K -> 64K). Text tokenizer not explicitly specified, but the LLM (Qwen2.5) typically uses its own. Audio is "tokenized" via the acoustic and semantic tokenizers.
|
40 |
|
41 |
|
42 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
43 |
### Direct intended uses
|
44 |
The VibeVoice model is limited to research purpose use exploring highly realistic audio dialogue generation detailed in the [tech report](https://github.com/microsoft/VibeVoice/blob/main/report/TechnicalReport.pdf).
|
45 |
|
|
|
39 |
- VibeVoice Training: Pre-trained tokenizers are frozen; only the LLM and diffusion head parameters are trained. A curriculum learning strategy is used for input sequence length (4k -> 16K -> 32K -> 64K). Text tokenizer not explicitly specified, but the LLM (Qwen2.5) typically uses its own. Audio is "tokenized" via the acoustic and semantic tokenizers.
|
40 |
|
41 |
|
42 |
+
## Models
|
43 |
+
| Model | Context Length | Generation Length | Weight |
|
44 |
+
|-------|----------------|----------|----------|
|
45 |
+
| VibeVoice-0.5B-Streaming | - | - | On the way |
|
46 |
+
| VibeVoice-1.5B | 64K | ~90 min | You are here. |
|
47 |
+
| VibeVoice-7B-Preview| 32K | ~45 min | [HF link](https://huggingface.co/WestZhang/VibeVoice-Large-pt) |
|
48 |
+
|
49 |
+
## Installation and Usage
|
50 |
+
|
51 |
+
Please refer to [GitHub README](https://github.com/microsoft/VibeVoice?tab=readme-ov-file#installation)
|
52 |
+
|
53 |
+
## Responsible Usage
|
54 |
### Direct intended uses
|
55 |
The VibeVoice model is limited to research purpose use exploring highly realistic audio dialogue generation detailed in the [tech report](https://github.com/microsoft/VibeVoice/blob/main/report/TechnicalReport.pdf).
|
56 |
|