gpt-omni
/

mini-omni

speech-to-speech

Model card Files Files and versions

gpt-omni commited on Aug 30, 2024

Commit

a722089

·

verified ·

1 Parent(s): 9896323

Update README.md

Files changed (1) hide show

README.md +36 -3

README.md CHANGED Viewed

@@ -1,3 +1,36 @@
----
-license: mit
----

+---
+license: mit
+language:
+- en
+base_model: Qwen/Qwen2-0.5B
+---
+<p align="center"><strong style="font-size: 18px;">
+Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
+</strong>
+</p>
+<p align="center">
+🤗 <a href="">Hugging Face</a>   | 📖 <a href="https://github.com/gpt-omni/mini-omni">Github</a>
+|     📑 <a href="https://arxiv.org/abs/2408.16725">Technical report</a>
+</p>
+Mini-Omni is an open-source multimodel large language model that can **hear, talk while thinking**. Featuring real-time end-to-end speech input and **streaming audio output** conversational capabilities.
+<p align="center">
+    <img src="frameworkv3.jpg" width="100%"/>
+</p>
+## Features
+✅ **Real-time speech-to-speech** conversational capabilities. No extra ASR or TTS models required.
+✅ **Talking while thinking**, with the ability to generate text and audio at the same time.
+✅ **Streaming audio outupt** capabilities.
+✅ With "Audio-to-Text" and "Audio-to-Audio" **batch inference** to further boost the performance.
+**NOTE**: please refer to https://github.com/gpt-omni/mini-omni for more details.