Files changed (1) hide show
  1. README.md +4 -5
README.md CHANGED
@@ -1,12 +1,11 @@
1
  ---
 
2
  language:
3
  - en
4
  - zh
5
- license: mit
6
  pipeline_tag: text-to-speech
7
  tags:
8
  - Podcast
9
- library_name: transformers
10
  ---
11
 
12
  ## VibeVoice: A Frontier Open-Source Text-to-Speech Model
@@ -27,7 +26,7 @@ The model can synthesize speech up to **90 minutes** long with up to **4 distinc
27
  <img src="figures/Fig1.png" alt="VibeVoice Overview" height="250px">
28
  </p>
29
 
30
- ## Training Details
31
  Transformer-based Large Language Model (LLM) integrated with specialized acoustic and semantic tokenizers and a diffusion-based decoding head.
32
  - LLM: [Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B) for this release.
33
  - Tokenizers:
@@ -45,7 +44,7 @@ Transformer-based Large Language Model (LLM) integrated with specialized acousti
45
  |-------|----------------|----------|----------|
46
  | VibeVoice-0.5B-Streaming | - | - | On the way |
47
  | VibeVoice-1.5B | 64K | ~90 min | You are here. |
48
- | VibeVoice-Large| 32K | ~45 min | [HF link](https://huggingface.co/microsoft/VibeVoice-Large) |
49
 
50
  ## Installation and Usage
51
 
@@ -53,7 +52,7 @@ Please refer to [GitHub README](https://github.com/microsoft/VibeVoice?tab=readm
53
 
54
  ## Responsible Usage
55
  ### Direct intended uses
56
- The VibeVoice model is limited to research purpose use exploring highly realistic audio dialogue generation detailed in the [tech report](https://arxiv.org/pdf/2508.19205).
57
 
58
  ### Out-of-scope uses
59
  Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by MIT License. Use to generate any text transcript. Furthermore, this release is not intended or licensed for any of the following scenarios:
 
1
  ---
2
+ license: mit
3
  language:
4
  - en
5
  - zh
 
6
  pipeline_tag: text-to-speech
7
  tags:
8
  - Podcast
 
9
  ---
10
 
11
  ## VibeVoice: A Frontier Open-Source Text-to-Speech Model
 
26
  <img src="figures/Fig1.png" alt="VibeVoice Overview" height="250px">
27
  </p>
28
 
29
+ ## Training details
30
  Transformer-based Large Language Model (LLM) integrated with specialized acoustic and semantic tokenizers and a diffusion-based decoding head.
31
  - LLM: [Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B) for this release.
32
  - Tokenizers:
 
44
  |-------|----------------|----------|----------|
45
  | VibeVoice-0.5B-Streaming | - | - | On the way |
46
  | VibeVoice-1.5B | 64K | ~90 min | You are here. |
47
+ | VibeVoice-7B-Preview| 32K | ~45 min | [HF link](https://huggingface.co/WestZhang/VibeVoice-Large-pt) |
48
 
49
  ## Installation and Usage
50
 
 
52
 
53
  ## Responsible Usage
54
  ### Direct intended uses
55
+ The VibeVoice model is limited to research purpose use exploring highly realistic audio dialogue generation detailed in the [tech report](https://github.com/microsoft/VibeVoice/blob/main/report/TechnicalReport.pdf).
56
 
57
  ### Out-of-scope uses
58
  Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by MIT License. Use to generate any text transcript. Furthermore, this release is not intended or licensed for any of the following scenarios: