Add `library_name: transformers` to model card metadata (#1)
Browse files- Add `library_name: transformers` to model card metadata (5bbf78e32a451cc51c9f88204e294da1b0847ee4)
Co-authored-by: Niels Rogge <[email protected]>
README.md
CHANGED
@@ -1,5 +1,4 @@
|
|
1 |
---
|
2 |
-
license: apache-2.0
|
3 |
language:
|
4 |
- en
|
5 |
- zh
|
@@ -7,11 +6,14 @@ language:
|
|
7 |
- fr
|
8 |
- de
|
9 |
- ko
|
|
|
10 |
pipeline_tag: text-to-speech
|
11 |
tags:
|
12 |
- Speech-Tokenizer
|
13 |
- Text-to-Speech
|
|
|
14 |
---
|
|
|
15 |
# 🚀 TaDiCodec
|
16 |
|
17 |
We introduce the **T**ext-**a**ware **Di**ffusion Transformer Speech **Codec** (TaDiCodec), a novel approach to speech tokenization that employs end-to-end optimization for quantization and reconstruction through a **diffusion autoencoder**, while integrating **text guidance** into the diffusion decoder to enhance reconstruction quality and achieve **optimal compression**. TaDiCodec achieves an extremely low frame rate of **6.25 Hz** and a corresponding bitrate of **0.0875 kbps** with a single-layer codebook for **24 kHz speech**, while maintaining superior performance on critical speech generation evaluation metrics such as Word Error Rate (WER), speaker similarity (SIM), and speech quality (UTMOS).
|
@@ -187,4 +189,4 @@ MaskGCT:
|
|
187 |
|
188 |
- **(Binary Spherical Quantization) BSQ** is built upon [vector-quantize-pytorch](https://github.com/lucidrains/vector-quantize-pytorch) and [bsq-vit](https://github.com/zhaoyue-zephyrus/bsq-vit).
|
189 |
|
190 |
-
- **Training codebase** is built upon [Amphion](https://github.com/open-mmlab/Amphion) and [accelerate](https://github.com/huggingface/accelerate).
|
|
|
1 |
---
|
|
|
2 |
language:
|
3 |
- en
|
4 |
- zh
|
|
|
6 |
- fr
|
7 |
- de
|
8 |
- ko
|
9 |
+
license: apache-2.0
|
10 |
pipeline_tag: text-to-speech
|
11 |
tags:
|
12 |
- Speech-Tokenizer
|
13 |
- Text-to-Speech
|
14 |
+
library_name: transformers
|
15 |
---
|
16 |
+
|
17 |
# 🚀 TaDiCodec
|
18 |
|
19 |
We introduce the **T**ext-**a**ware **Di**ffusion Transformer Speech **Codec** (TaDiCodec), a novel approach to speech tokenization that employs end-to-end optimization for quantization and reconstruction through a **diffusion autoencoder**, while integrating **text guidance** into the diffusion decoder to enhance reconstruction quality and achieve **optimal compression**. TaDiCodec achieves an extremely low frame rate of **6.25 Hz** and a corresponding bitrate of **0.0875 kbps** with a single-layer codebook for **24 kHz speech**, while maintaining superior performance on critical speech generation evaluation metrics such as Word Error Rate (WER), speaker similarity (SIM), and speech quality (UTMOS).
|
|
|
189 |
|
190 |
- **(Binary Spherical Quantization) BSQ** is built upon [vector-quantize-pytorch](https://github.com/lucidrains/vector-quantize-pytorch) and [bsq-vit](https://github.com/zhaoyue-zephyrus/bsq-vit).
|
191 |
|
192 |
+
- **Training codebase** is built upon [Amphion](https://github.com/open-mmlab/Amphion) and [accelerate](https://github.com/huggingface/accelerate).
|