File size: 4,559 Bytes

---

language: en
datasets:
- librispeech
metrics:
- wer
pipeline_tag: automatic-speech-recognition
tags:
- transcription
- audio
- speech
- chunkformer
- asr
- automatic-speech-recognition
- long-form transcription
- librispeech
license: cc-by-nc-4.0
model-index:
- name: ChunkFormer-Large-En-Libri-960h
  results:
  - task: 
      name: Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: test-clean
      type: librispeech
      args: en
    metrics:
       - name: Test WER
         type: wer
         value: 2.69
  - task: 
      name: Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: test-other
      type: librispeech
      args: en
    metrics:
       - name: Test WER
         type: wer
         value: 6.89
---


# **ChunkFormer-Large-En-Libri-960h: Pretrained ChunkFormer-Large on 960 hours of LibriSpeech dataset**

[![License: CC BY-NC 4.0](https://img.shields.io/badge/License-CC%20BY--NC%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc/4.0/)
[![GitHub](https://img.shields.io/badge/GitHub-ChunkFormer-blue)](https://github.com/khanld/chunkformer)
[![Paper](https://img.shields.io/badge/Paper-ICASSP%202025-green)](paper.pdf)

---
## Table of contents
1. [Model Description](#description)
2. [Documentation and Implementation](#implementation)
3. [Benchmark Results](#benchmark)
4. [Usage](#usage)
6. [Citation](#citation)
7. [Contact](#contact)

---
<a name = "description" ></a>
## Model Description
**ChunkFormer-Large-En-Libri-960h** is an English Automatic Speech Recognition (ASR) model based on the **ChunkFormer** architecture, introduced at **ICASSP 2025**. The model has been fine-tuned on 960 hours of LibriSpeech, a widely-used dataset for ASR research.

---
<a name = "implementation" ></a>
## Documentation and Implementation
The [Documentation]() and [Implementation](https://github.com/khanld/chunkformer) of ChunkFormer are publicly available.

---
<a name = "benchmark" ></a>
## Benchmark Results
We evaluate the models using **Word Error Rate (WER)**. To ensure a fair comparison, all models are trained exclusively with the [**WENET**](https://github.com/wenet-e2e/wenet) framework.

| STT | Model                 | Test-Clean | Test-Other | Avg.  |
|-----|-----------------------|------------|------------|------ |
|  1  | **ChunkFormer**       | 2.69       | 6.89       | 4.79  |
|  2  | **Efficient Conformer** | 2.71     | 6.95       | 4.83  |
|  3  | **Conformer**         | 2.77       | 6.93       | 4.85  | 
|  4  | **Squeezeformer**     | 2.87       | 7.16       | 5.02  |



---
<a name = "usage" ></a>
## Quick Usage
To use the ChunkFormer model for English Automatic Speech Recognition, follow these steps:

1. **Download the ChunkFormer Repository**
```bash

git clone https://github.com/khanld/chunkformer.git

cd chunkformer

pip install -r requirements.txt   

```
2. **Download the Model Checkpoint from Hugging Face**
```bash

pip install huggingface_hub

huggingface-cli download khanhld/chunkformer-large-en-libri-960h --local-dir "./chunkformer-large-en-libri-960h"

```
or
```bash

git lfs install

git clone https://huggingface.co/khanhld/chunkformer-large-en-libri-960h

```
This will download the model checkpoint to the checkpoints folder inside your chunkformer directory.

3. **Run the model**
```bash

python decode.py \

    --model_checkpoint path/to/local/chunkformer-large-en-libri-960h \

    --long_form_audio path/to/audio.wav \

    --max_duration 14400 \ #in second, default is 1800

    --chunk_size 64 \

    --left_context_size 128 \

    --right_context_size 128

```
**Advanced Usage** can be found [HERE](https://github.com/khanld/chunkformer/tree/main?tab=readme-ov-file#usage)


---
<a name = "citation" ></a>
## Citation
If you use this work in your research, please cite:

```bibtex

@inproceedings{chunkformer,

  title={ChunkFormer: Masked Chunking Conformer For Long-Form Speech Transcription},

  author={Khanh Le, Tuan Vu Ho, Dung Tran and Duc Thanh Chau},

  booktitle={ICASSP},

  year={2025}

}

```

---
<a name = "contact"></a>
## Contact
- [email protected]
- [![GitHub](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)](https://github.com/khanld)
- [![LinkedIn](https://img.shields.io/badge/linkedin-%230077B5.svg?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/khanhld257/)