Add `library_name: transformers` to model card metadata

This PR enhances the model card by adding the `library_name: transformers` metadata tag.

This addition is justified by:
- The `config.json` indicating `Qwen2ForCausalLM` as the architecture and `transformers_version: "4.40.1"`.
- The "Acknowledgments" section in the README explicitly stating that "NAR Llama-style transformers is built upon [transformers](https://github.com/huggingface/transformers)".

Adding this metadata will enable an automated, pre-defined code snippet on the model page, showcasing how to easily load and use the model with the 🤗 Transformers library.

Files changed (1) hide show

README.md +4 -2

README.md CHANGED Viewed

@@ -1,5 +1,4 @@
 ---
-license: apache-2.0
 language:
 - en
 - zh
@@ -7,11 +6,14 @@ language:
 - fr
 - de
 - ko
 pipeline_tag: text-to-speech
 tags:
 - Speech-Tokenizer
 - Text-to-Speech
 ---
 # 🚀 TaDiCodec
 We introduce the **T**ext-**a**ware **Di**ffusion Transformer Speech **Codec** (TaDiCodec), a novel approach to speech tokenization that employs end-to-end optimization for quantization and reconstruction through a **diffusion autoencoder**, while integrating **text guidance** into the diffusion decoder to enhance reconstruction quality and achieve **optimal compression**. TaDiCodec achieves an extremely low frame rate of **6.25 Hz** and a corresponding bitrate of **0.0875 kbps** with a single-layer codebook for **24 kHz speech**, while maintaining superior performance on critical speech generation evaluation metrics such as Word Error Rate (WER), speaker similarity (SIM), and speech quality (UTMOS).
@@ -187,4 +189,4 @@ MaskGCT:
 - **(Binary Spherical Quantization) BSQ** is built upon [vector-quantize-pytorch](https://github.com/lucidrains/vector-quantize-pytorch) and [bsq-vit](https://github.com/zhaoyue-zephyrus/bsq-vit).
-- **Training codebase** is built upon [Amphion](https://github.com/open-mmlab/Amphion) and [accelerate](https://github.com/huggingface/accelerate).

 ---
 language:
 - en
 - zh
 - fr
 - de
 - ko
+license: apache-2.0
 pipeline_tag: text-to-speech
 tags:
 - Speech-Tokenizer
 - Text-to-Speech
+library_name: transformers
 ---
 # 🚀 TaDiCodec
 We introduce the **T**ext-**a**ware **Di**ffusion Transformer Speech **Codec** (TaDiCodec), a novel approach to speech tokenization that employs end-to-end optimization for quantization and reconstruction through a **diffusion autoencoder**, while integrating **text guidance** into the diffusion decoder to enhance reconstruction quality and achieve **optimal compression**. TaDiCodec achieves an extremely low frame rate of **6.25 Hz** and a corresponding bitrate of **0.0875 kbps** with a single-layer codebook for **24 kHz speech**, while maintaining superior performance on critical speech generation evaluation metrics such as Word Error Rate (WER), speaker similarity (SIM), and speech quality (UTMOS).
 - **(Binary Spherical Quantization) BSQ** is built upon [vector-quantize-pytorch](https://github.com/lucidrains/vector-quantize-pytorch) and [bsq-vit](https://github.com/zhaoyue-zephyrus/bsq-vit).
+- **Training codebase** is built upon [Amphion](https://github.com/open-mmlab/Amphion) and [accelerate](https://github.com/huggingface/accelerate).