Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,58 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
tags:
|
3 |
+
- espnet
|
4 |
+
- audio
|
5 |
+
- automatic-speech-recognition
|
6 |
+
- speech-translation
|
7 |
+
language: multilingual
|
8 |
+
datasets:
|
9 |
+
- owsm_v3.1
|
10 |
+
license: cc-by-4.0
|
11 |
+
---
|
12 |
+
|
13 |
+
## OWLS: Open Whisper-style Large-scale neural model Suite
|
14 |
+
|
15 |
+
OWLS is a suite of Whisper-style models, designed to help researchers understand the scaling properties of speech models.
|
16 |
+
OWLS models range from 0.25B to 18B parameters, and are trained on up to 360K hours of data.
|
17 |
+
|
18 |
+
OWLS models are developed using [ESPnet](https://github.com/espnet/espnet), and support multilingual Speech Recognition and Translation.
|
19 |
+
|
20 |
+
It is part of the [OWSM](https://www.wavlab.org/activities/2024/owsm/) project, which aims to develop fully open speech foundation models using publicly available data and open-source toolkits.
|
21 |
+
|
22 |
+
The model in this repo has 17.64B parameters in total and is trained on 180k hours of public speech data.
|
23 |
+
Specifically, it supports the following speech-to-text tasks:
|
24 |
+
- Speech recognition
|
25 |
+
- Any-to-any-language speech translation
|
26 |
+
- Utterance-level alignment
|
27 |
+
- Long-form transcription
|
28 |
+
- Language identification
|
29 |
+
|
30 |
+
## Use this model
|
31 |
+
|
32 |
+
You can use this model in your projects with the following code:
|
33 |
+
|
34 |
+
```python
|
35 |
+
# make sure espnet is installed: pip install espnet
|
36 |
+
from espnet2.bin.s2t_inference import Speech2Text
|
37 |
+
|
38 |
+
model = Speech2Text.from_pretrained(
|
39 |
+
"espnet/owls_18B_180K"
|
40 |
+
)
|
41 |
+
|
42 |
+
speech, rate = soundfile.read("speech.wav")
|
43 |
+
text, *_ = model(speech)[0]
|
44 |
+
```
|
45 |
+
|
46 |
+
|
47 |
+
## Citations
|
48 |
+
|
49 |
+
```
|
50 |
+
@article{chen2025owls,
|
51 |
+
title={OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models},
|
52 |
+
author={Chen, William and Tian, Jinchuan and Peng, Yifan and Yan, Brian and Yang, Chao-Han Huck and Watanabe, Shinji},
|
53 |
+
journal={arXiv preprint arXiv:2502.10373},
|
54 |
+
year={2025}
|
55 |
+
}
|
56 |
+
```
|
57 |
+
|
58 |
+
|