OWSM V4 Demo
This is a demo for OWSM-V4 CTC and medium model.
Fully open Whisper-style speech foundation models developed by CMU WAVLab: https://www.wavlab.org/activities/2024/owsm/
This is a demo for OWSM-V4 CTC and medium model.
Note The filtered YODAS subset used to train the OWSM v4 series.
Note OWSM-CTC v4 is trained on a newly curated dataset from YODAS along with previous OWSM data, which significantly enhances multilingual performance.
Note OWSM v4 (1B) is trained on a newly curated dataset from YODAS along with previous OWSM data, which significantly enhances multilingual performance.
Note OWSM v4 (370M) is trained on a newly curated dataset from YODAS along with previous OWSM data, which significantly enhances multilingual performance.
Note OWSM v4 (102M) is trained on a newly curated dataset from YODAS along with previous OWSM data, which significantly enhances multilingual performance.
Note OWSM-CTC v3.1 is further fine-tuned on v3.2 data to improve long-form robustness.
Note (ACL'24) CTC-based non-autoregressive speech foundation model for multilingual ASR, ST, and LID.
Note (INTERSPEECH'24) OWSM v3.1 medium with 1.02B parameters.
Note (INTERSPEECH'24) OWSM v3.1 small with 367M parameters.
Note (INTERSPEECH'24) OWSM v3.1 base with 101M parameters.
Note (INTERSPEECH'24) OWSM v3.1 small trained on a subset of data with low restriction licenses.
Note (INTERSPEECH'24) OWSM small with data cleaning.