Add `library_name: transformers` to model card metadata (#1)

- Add `library_name: transformers` to model card metadata (5bbf78e32a451cc51c9f88204e294da1b0847ee4)

Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,5 +1,4 @@
 ---
-license: apache-2.0
 language:
 - en
 - zh
@@ -7,11 +6,14 @@ language:
 - fr
 - de
 - ko
 pipeline_tag: text-to-speech
 tags:
 - Speech-Tokenizer
 - Text-to-Speech
 ---
 # 🚀 TaDiCodec
 We introduce the **T**ext-**a**ware **Di**ffusion Transformer Speech **Codec** (TaDiCodec), a novel approach to speech tokenization that employs end-to-end optimization for quantization and reconstruction through a **diffusion autoencoder**, while integrating **text guidance** into the diffusion decoder to enhance reconstruction quality and achieve **optimal compression**. TaDiCodec achieves an extremely low frame rate of **6.25 Hz** and a corresponding bitrate of **0.0875 kbps** with a single-layer codebook for **24 kHz speech**, while maintaining superior performance on critical speech generation evaluation metrics such as Word Error Rate (WER), speaker similarity (SIM), and speech quality (UTMOS).
@@ -187,4 +189,4 @@ MaskGCT:
 - **(Binary Spherical Quantization) BSQ** is built upon [vector-quantize-pytorch](https://github.com/lucidrains/vector-quantize-pytorch) and [bsq-vit](https://github.com/zhaoyue-zephyrus/bsq-vit).
-- **Training codebase** is built upon [Amphion](https://github.com/open-mmlab/Amphion) and [accelerate](https://github.com/huggingface/accelerate).

 ---
 language:
 - en
 - zh
 - fr
 - de
 - ko
+license: apache-2.0
 pipeline_tag: text-to-speech
 tags:
 - Speech-Tokenizer
 - Text-to-Speech
+library_name: transformers
 ---
 # 🚀 TaDiCodec
 We introduce the **T**ext-**a**ware **Di**ffusion Transformer Speech **Codec** (TaDiCodec), a novel approach to speech tokenization that employs end-to-end optimization for quantization and reconstruction through a **diffusion autoencoder**, while integrating **text guidance** into the diffusion decoder to enhance reconstruction quality and achieve **optimal compression**. TaDiCodec achieves an extremely low frame rate of **6.25 Hz** and a corresponding bitrate of **0.0875 kbps** with a single-layer codebook for **24 kHz speech**, while maintaining superior performance on critical speech generation evaluation metrics such as Word Error Rate (WER), speaker similarity (SIM), and speech quality (UTMOS).
 - **(Binary Spherical Quantization) BSQ** is built upon [vector-quantize-pytorch](https://github.com/lucidrains/vector-quantize-pytorch) and [bsq-vit](https://github.com/zhaoyue-zephyrus/bsq-vit).
+- **Training codebase** is built upon [Amphion](https://github.com/open-mmlab/Amphion) and [accelerate](https://github.com/huggingface/accelerate).