metadata

title: CTC-Baseline XLSR-based ASR model - set 1-extra
language: multilingual
tags:
  - asr
  - ctc-dro
  - XLSR
license: cc-by-nc-4.0

CTC-DRO XLSR-based ASR model - set 1-extra

This repository contains a CTC-Baseline XLSR-based automatic speech recognition (ASR) model trained with ESPnet.
The model was trained on unbalanced training data from set 1-extra.

Intended Use

This model is intended for ASR. Users can run inference using the provided checkpoint (valid.loss.best.pth) and configuration file (config.yaml):

import soundfile as sf
from espnet2.bin.asr_inference import Speech2Text

asr_train_config = "ctc-baseline_xlsr_set_1-extra/config.yaml"
asr_model_file = "ctc-baseline_xlsr_set_1-extra/valid.loss.best.pth"

model = Speech2Text.from_pretrained(
    asr_train_config=asr_train_config,
    asr_model_file=asr_model_file
)

speech, _ = sf.read("input.wav")
text, *_ = model(speech)[0]

print("Recognized text:", text)

How to Use

Clone this repository.
Use ESPnet’s inference scripts with the provided config.yaml and checkpoint file.
Ensure any external resources referenced in config.yaml are available at the indicated relative paths.