Add `library_name: transformers` to model card metadata
Browse filesThis PR enhances the model card by adding the `library_name: transformers` metadata tag.
This addition is justified by:
- The `config.json` indicating `Qwen2ForCausalLM` as the architecture and `transformers_version: "4.40.1"`.
- The "Acknowledgments" section in the README explicitly stating that "NAR Llama-style transformers is built upon [transformers](https://github.com/huggingface/transformers)".
Adding this metadata will enable an automated, pre-defined code snippet on the model page, showcasing how to easily load and use the model with the 🤗 Transformers library.
README.md
CHANGED
|
@@ -1,5 +1,4 @@
|
|
| 1 |
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
language:
|
| 4 |
- en
|
| 5 |
- zh
|
|
@@ -7,11 +6,14 @@ language:
|
|
| 7 |
- fr
|
| 8 |
- de
|
| 9 |
- ko
|
|
|
|
| 10 |
pipeline_tag: text-to-speech
|
| 11 |
tags:
|
| 12 |
- Speech-Tokenizer
|
| 13 |
- Text-to-Speech
|
|
|
|
| 14 |
---
|
|
|
|
| 15 |
# 🚀 TaDiCodec
|
| 16 |
|
| 17 |
We introduce the **T**ext-**a**ware **Di**ffusion Transformer Speech **Codec** (TaDiCodec), a novel approach to speech tokenization that employs end-to-end optimization for quantization and reconstruction through a **diffusion autoencoder**, while integrating **text guidance** into the diffusion decoder to enhance reconstruction quality and achieve **optimal compression**. TaDiCodec achieves an extremely low frame rate of **6.25 Hz** and a corresponding bitrate of **0.0875 kbps** with a single-layer codebook for **24 kHz speech**, while maintaining superior performance on critical speech generation evaluation metrics such as Word Error Rate (WER), speaker similarity (SIM), and speech quality (UTMOS).
|
|
@@ -187,4 +189,4 @@ MaskGCT:
|
|
| 187 |
|
| 188 |
- **(Binary Spherical Quantization) BSQ** is built upon [vector-quantize-pytorch](https://github.com/lucidrains/vector-quantize-pytorch) and [bsq-vit](https://github.com/zhaoyue-zephyrus/bsq-vit).
|
| 189 |
|
| 190 |
-
- **Training codebase** is built upon [Amphion](https://github.com/open-mmlab/Amphion) and [accelerate](https://github.com/huggingface/accelerate).
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
language:
|
| 3 |
- en
|
| 4 |
- zh
|
|
|
|
| 6 |
- fr
|
| 7 |
- de
|
| 8 |
- ko
|
| 9 |
+
license: apache-2.0
|
| 10 |
pipeline_tag: text-to-speech
|
| 11 |
tags:
|
| 12 |
- Speech-Tokenizer
|
| 13 |
- Text-to-Speech
|
| 14 |
+
library_name: transformers
|
| 15 |
---
|
| 16 |
+
|
| 17 |
# 🚀 TaDiCodec
|
| 18 |
|
| 19 |
We introduce the **T**ext-**a**ware **Di**ffusion Transformer Speech **Codec** (TaDiCodec), a novel approach to speech tokenization that employs end-to-end optimization for quantization and reconstruction through a **diffusion autoencoder**, while integrating **text guidance** into the diffusion decoder to enhance reconstruction quality and achieve **optimal compression**. TaDiCodec achieves an extremely low frame rate of **6.25 Hz** and a corresponding bitrate of **0.0875 kbps** with a single-layer codebook for **24 kHz speech**, while maintaining superior performance on critical speech generation evaluation metrics such as Word Error Rate (WER), speaker similarity (SIM), and speech quality (UTMOS).
|
|
|
|
| 189 |
|
| 190 |
- **(Binary Spherical Quantization) BSQ** is built upon [vector-quantize-pytorch](https://github.com/lucidrains/vector-quantize-pytorch) and [bsq-vit](https://github.com/zhaoyue-zephyrus/bsq-vit).
|
| 191 |
|
| 192 |
+
- **Training codebase** is built upon [Amphion](https://github.com/open-mmlab/Amphion) and [accelerate](https://github.com/huggingface/accelerate).
|