wanchichen commited on
Commit
ff2d2ad
·
verified ·
1 Parent(s): 4ba91ee

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -0
README.md ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - automatic-speech-recognition
6
+ - speech-translation
7
+ language: multilingual
8
+ datasets:
9
+ - owsm_v3.1
10
+ license: cc-by-4.0
11
+ ---
12
+
13
+ ## OWLS: Open Whisper-style Large-scale neural model Suite
14
+
15
+ OWLS is a suite of Whisper-style models, designed to help researchers understand the scaling properties of speech models.
16
+ OWLS models range from 0.25B to 18B parameters, and are trained on up to 360K hours of data.
17
+
18
+ OWLS models are developed using [ESPnet](https://github.com/espnet/espnet), and support multilingual Speech Recognition and Translation.
19
+
20
+ It is part of the [OWSM](https://www.wavlab.org/activities/2024/owsm/) project, which aims to develop fully open speech foundation models using publicly available data and open-source toolkits.
21
+
22
+ The model in this repo has 17.64B parameters in total and is trained on 180k hours of public speech data.
23
+ Specifically, it supports the following speech-to-text tasks:
24
+ - Speech recognition
25
+ - Any-to-any-language speech translation
26
+ - Utterance-level alignment
27
+ - Long-form transcription
28
+ - Language identification
29
+
30
+ ## Use this model
31
+
32
+ You can use this model in your projects with the following code:
33
+
34
+ ```python
35
+ # make sure espnet is installed: pip install espnet
36
+ from espnet2.bin.s2t_inference import Speech2Text
37
+
38
+ model = Speech2Text.from_pretrained(
39
+ "espnet/owls_18B_180K"
40
+ )
41
+
42
+ speech, rate = soundfile.read("speech.wav")
43
+ text, *_ = model(speech)[0]
44
+ ```
45
+
46
+
47
+ ## Citations
48
+
49
+ ```
50
+ @article{chen2025owls,
51
+ title={OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models},
52
+ author={Chen, William and Tian, Jinchuan and Peng, Yifan and Yan, Brian and Yang, Chao-Han Huck and Watanabe, Shinji},
53
+ journal={arXiv preprint arXiv:2502.10373},
54
+ year={2025}
55
+ }
56
+ ```
57
+
58
+