metadata

license: other
license_name: test
license_link: LICENSE
language:
  - en
  - fr
  - de
  - es
  - pt
metrics:
  - accuracy
  - cer
pipeline_tag: automatic-speech-recognition

Model Card for Model ID

Preview-release for Fosdem 2025 with current training epochs (Training is still ongoing).

Overview

This is a family of low-latency streaming models designed for use on edge devices.
Goal: Provide faster or higher-quality performance compared to similarly sized Whisper and other models.

Languages: English, French, German (Spanish and Portuguese planned for release by Feb 14).

Demos

Browser Demo (CPU)
(Runs entirely in the browser using CPU.)
Gradio / Python Demo

License

The license is still under consideration (likely Coqui). The model is intended to be dual-licensed:

Free for non-commercial use.
Affordable license for commercial use.

Training

Training is done with a modified k2/Icefall pipeline.
Inference can be performed with the standard Sherpa project.

Acknowledgements

Special thanks to the Lhotse, Sherpa, k2, and Icefall teams for their support and tools.