guanwenhao commited on
Commit
50a5ae5
·
verified ·
1 Parent(s): 0cc29d3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +37 -3
README.md CHANGED
@@ -1,3 +1,37 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - openslr/librispeech_asr
5
+ language:
6
+ - en
7
+ base_model:
8
+ - HuggingFaceTB/SmolLM2-360M-Instruct
9
+ tags:
10
+ - audio
11
+ - speech
12
+ - tts
13
+ - asr
14
+ - unified_model
15
+ ---
16
+
17
+ ## 1. Introduction
18
+
19
+ This work introduces MonoSpeech, a novel approach that integrates autoregression and flow matching
20
+ within a transformer-based framework for speech unified understanding and generation.
21
+ MonoSpeech is designed to achieve both speech comprehension and generation capabilities through a unified model trained in a single stage.
22
+ Our experiments demonstrate that MonoSpeech delivers strong performance for both automatic speech recognition and zero-shot speech synthesis tasks.
23
+ By combining autoregression and flow matching, MonoSpeech establishes a foundation for expanding to additional audio understanding and generation tasks using the paradigm in the future.
24
+
25
+ [**Github Repository**](https://github.com/gwh22/MonoSpeech)
26
+
27
+ <div align="center">
28
+ <img alt="image" src="monospeech.pdf" style="width:90%;">
29
+ </div>
30
+
31
+
32
+
33
+ ## 2. Quick Start
34
+
35
+ Please refer to [**Github Repository**](https://github.com/gwh22/MonoSpeech)
36
+
37
+