wanchichen commited on
Commit
b56b68f
·
verified ·
1 Parent(s): 928d809

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -0
README.md ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - automatic-speech-recognition
6
+ - speech-translation
7
+ language: multilingual
8
+ datasets:
9
+ - owsm_v3.1
10
+ license: cc-by-4.0
11
+ ---
12
+
13
+ ## OWLS: Open Whisper-style Large-scale neural model Suite
14
+
15
+ OWLS is a suite of Whisper-style models, designed to help researchers understand the scaling properties of speech models.
16
+ OWLS models range from 0.25B to 18B parameters, and are trained on up to 360K hours of data.
17
+
18
+ OWLS models are developed using [ESPnet](https://github.com/espnet/espnet), and support multilingual Speech Recognition and Translation.
19
+
20
+ It is part of the [OWSM](https://www.wavlab.org/activities/2024/owsm/) project, which aims to develop fully open speech foundation models using publicly available data and open-source toolkits.
21
+
22
+ The model in this repo has 0.5B parameters in total and is trained on 180k hours of public speech data.
23
+ Specifically, it supports the following speech-to-text tasks:
24
+ - Speech recognition
25
+ - Any-to-any-language speech translation
26
+ - Utterance-level alignment
27
+ - Long-form transcription
28
+ - Language identification
29
+
30
+ ## Use this model
31
+
32
+ You can use this model in your projects with the following code:
33
+
34
+ ```python
35
+ # make sure espnet is installed: pip install espnet
36
+ from espnet2.bin.s2t_inference import Speech2Text
37
+
38
+ model = Speech2Text.from_pretrained(
39
+ "espnet/owls_05B_180K"
40
+ )
41
+
42
+ speech, rate = soundfile.read("speech.wav")
43
+ text, *_ = model(speech)[0]
44
+ ```
45
+
46
+
47
+ ## Citations
48
+
49
+ TBA
50
+
51
+