Hecheng0625 nielsr HF Staff commited on
Commit
5955b78
·
verified ·
1 Parent(s): 7ccc57a

Add `library_name: transformers` to model card metadata (#1)

Browse files

- Add `library_name: transformers` to model card metadata (5bbf78e32a451cc51c9f88204e294da1b0847ee4)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -1,5 +1,4 @@
1
  ---
2
- license: apache-2.0
3
  language:
4
  - en
5
  - zh
@@ -7,11 +6,14 @@ language:
7
  - fr
8
  - de
9
  - ko
 
10
  pipeline_tag: text-to-speech
11
  tags:
12
  - Speech-Tokenizer
13
  - Text-to-Speech
 
14
  ---
 
15
  # 🚀 TaDiCodec
16
 
17
  We introduce the **T**ext-**a**ware **Di**ffusion Transformer Speech **Codec** (TaDiCodec), a novel approach to speech tokenization that employs end-to-end optimization for quantization and reconstruction through a **diffusion autoencoder**, while integrating **text guidance** into the diffusion decoder to enhance reconstruction quality and achieve **optimal compression**. TaDiCodec achieves an extremely low frame rate of **6.25 Hz** and a corresponding bitrate of **0.0875 kbps** with a single-layer codebook for **24 kHz speech**, while maintaining superior performance on critical speech generation evaluation metrics such as Word Error Rate (WER), speaker similarity (SIM), and speech quality (UTMOS).
@@ -187,4 +189,4 @@ MaskGCT:
187
 
188
  - **(Binary Spherical Quantization) BSQ** is built upon [vector-quantize-pytorch](https://github.com/lucidrains/vector-quantize-pytorch) and [bsq-vit](https://github.com/zhaoyue-zephyrus/bsq-vit).
189
 
190
- - **Training codebase** is built upon [Amphion](https://github.com/open-mmlab/Amphion) and [accelerate](https://github.com/huggingface/accelerate).
 
1
  ---
 
2
  language:
3
  - en
4
  - zh
 
6
  - fr
7
  - de
8
  - ko
9
+ license: apache-2.0
10
  pipeline_tag: text-to-speech
11
  tags:
12
  - Speech-Tokenizer
13
  - Text-to-Speech
14
+ library_name: transformers
15
  ---
16
+
17
  # 🚀 TaDiCodec
18
 
19
  We introduce the **T**ext-**a**ware **Di**ffusion Transformer Speech **Codec** (TaDiCodec), a novel approach to speech tokenization that employs end-to-end optimization for quantization and reconstruction through a **diffusion autoencoder**, while integrating **text guidance** into the diffusion decoder to enhance reconstruction quality and achieve **optimal compression**. TaDiCodec achieves an extremely low frame rate of **6.25 Hz** and a corresponding bitrate of **0.0875 kbps** with a single-layer codebook for **24 kHz speech**, while maintaining superior performance on critical speech generation evaluation metrics such as Word Error Rate (WER), speaker similarity (SIM), and speech quality (UTMOS).
 
189
 
190
  - **(Binary Spherical Quantization) BSQ** is built upon [vector-quantize-pytorch](https://github.com/lucidrains/vector-quantize-pytorch) and [bsq-vit](https://github.com/zhaoyue-zephyrus/bsq-vit).
191
 
192
+ - **Training codebase** is built upon [Amphion](https://github.com/open-mmlab/Amphion) and [accelerate](https://github.com/huggingface/accelerate).