Open Whisper-style Speech Models (OWSM)

pyf98 's Collections

updated 3 days ago

Fully open Whisper-style speech foundation models developed by CMU WAVLab: https://www.wavlab.org/activities/2024/owsm/

Upvote

Running on Zero

8

8

OWSM V4 Demo

🌍

This is a demo for OWSM-V4 CTC and medium model.
Runtime error

55

55

OWSM Demo

🔊
espnet/yodas_owsmv4

Viewer • Updated about 22 hours ago • 4 • 2.44k • 13

Note The filtered YODAS subset used to train the OWSM v4 series.
espnet/owsm_ctc_v4_1B

Automatic Speech Recognition • Updated 3 days ago • 65 • 5

Note OWSM-CTC v4 is trained on a newly curated dataset from YODAS along with previous OWSM data, which significantly enhances multilingual performance.
espnet/owsm_v4_medium_1B

Automatic Speech Recognition • Updated 3 days ago • 31 • 3

Note OWSM v4 (1B) is trained on a newly curated dataset from YODAS along with previous OWSM data, which significantly enhances multilingual performance.
espnet/owsm_v4_small_370M

Automatic Speech Recognition • Updated 3 days ago • 15 • 4

Note OWSM v4 (370M) is trained on a newly curated dataset from YODAS along with previous OWSM data, which significantly enhances multilingual performance.
espnet/owsm_v4_base_102M

Automatic Speech Recognition • Updated 3 days ago • 13 • 1

Note OWSM v4 (102M) is trained on a newly curated dataset from YODAS along with previous OWSM data, which significantly enhances multilingual performance.
espnet/owsm_ctc_v3.2_ft_1B

Automatic Speech Recognition • Updated 3 days ago • 32 • 4

Note OWSM-CTC v3.1 is further fine-tuned on v3.2 data to improve long-form robustness.
espnet/owsm_ctc_v3.1_1B

Automatic Speech Recognition • Updated 3 days ago • 20 • 14

Note (ACL'24) CTC-based non-autoregressive speech foundation model for multilingual ASR, ST, and LID.
espnet/owsm_v3.1_ebf

Automatic Speech Recognition • Updated 3 days ago • 314 • 17

Note (INTERSPEECH'24) OWSM v3.1 medium with 1.02B parameters.
espnet/owsm_v3.1_ebf_small

Automatic Speech Recognition • Updated 3 days ago • 7 • 2

Note (INTERSPEECH'24) OWSM v3.1 small with 367M parameters.
espnet/owsm_v3.1_ebf_base

Automatic Speech Recognition • Updated 3 days ago • 15 • 3

Note (INTERSPEECH'24) OWSM v3.1 base with 101M parameters.
espnet/owsm_v3.1_ebf_small_lowrestriction

Automatic Speech Recognition • Updated Mar 27 • 5 • 2

Note (INTERSPEECH'24) OWSM v3.1 small trained on a subset of data with low restriction licenses.
espnet/owsm_v3.2

Automatic Speech Recognition • Updated Aug 26, 2024 • 50 • 5

Note (INTERSPEECH'24) OWSM small with data cleaning.
espnet/owsm_v3

Automatic Speech Recognition • Updated Feb 6 • 4 • 29
espnet/owsm_v2_ebranchformer

Automatic Speech Recognition • Updated Oct 30, 2023
espnet/owsm_v2

Automatic Speech Recognition • Updated Jul 29, 2023 • 4 • 4
espnet/owsm_v1

Automatic Speech Recognition • Updated Oct 19, 2023 • 2
OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning

Paper • 2506.00338 • Published May 31 • 10
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification

Paper • 2402.12654 • Published Feb 20, 2024 • 1
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer

Paper • 2401.16658 • Published Jan 30, 2024 • 14
Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data

Paper • 2309.13876 • Published Sep 25, 2023 • 1

Upvote

Open Whisper-style Speech Models (OWSM)

OWSM V4 Demo

OWSM Demo