DavidBrowne17 commited on
Commit
c6f1bdc
·
verified ·
1 Parent(s): 559a416

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -10
README.md CHANGED
@@ -5,15 +5,15 @@ base_model:
5
  pipeline_tag: audio-to-audio
6
  ---
7
 
8
- Mochi is a finetuned speech-text foundation model and full-duplex spoken dialogue framework, based on the original Moshi model.
9
 
10
- How to use:
11
 
12
- You can use the original moshi ui to try out this model, just start the server pointed to this model
13
 
14
  https://github.com/kyutai-labs/moshi
15
 
16
- python -m moshi.server [--gradio-tunnel] [--hf-repo DavidBrowne17/Mochi]
17
 
18
  Model Details
19
 
@@ -27,11 +27,11 @@ License: apache 2.0
27
 
28
  Model Description
29
 
30
- Mochi is a refined version of the Moshi model, designed for smoother, more adaptable dialogue generation. Building upon Moshi’s speech-to-speech generation foundation, Mochi enhances conversational coherence and reduces latency. Like Moshi, it uses a residual quantizer from a neural audio codec to generate speech tokens and models its own and user speech into parallel streams. This framework supports dynamic conversational flow without rigid speaker turns.
31
 
32
- Mochi also implements the "Inner Monologue" method, predicting time-aligned text tokens before generating speech tokens. This approach enhances linguistic quality, supports streaming speech recognition, and improves text-to-speech output. Mochi achieves a practical latency of approximately 200ms, ensuring near real-time interaction.
33
 
34
- Key Enhancements in Mochi:
35
 
36
  Reduced latency and smoother conversational flow.
37
 
@@ -43,7 +43,7 @@ Uses
43
 
44
  Direct Use
45
 
46
- Mochi can be deployed as a conversational agent for:
47
 
48
  Casual conversation.
49
 
@@ -61,7 +61,7 @@ The finetuned architecture allows for domain-specific adaptations with additiona
61
 
62
  Out-of-Scope Use
63
 
64
- Mochi is not intended for:
65
 
66
  Impersonating individuals.
67
 
@@ -71,4 +71,5 @@ Professional advice or critical decision-making.
71
 
72
  Bias, Risks, and Limitations
73
 
74
- Mochi inherits safeguards from Moshi but may still exhibit biases due to the nature of its training data. While toxicity has been minimized, there are risks of over-representation from certain data domains. The model is trained to produce a consistent voice and is not designed for impersonation. Further testing is necessary to evaluate long-term sociotechnical impacts
 
 
5
  pipeline_tag: audio-to-audio
6
  ---
7
 
8
+ Muchi is a finetuned speech-text foundation model and full-duplex spoken dialogue framework, based on the original Moshi model.
9
 
10
+ How to use:
11
 
12
+ You can use the original moshi ui to try out this model, just start the server pointed to this model
13
 
14
  https://github.com/kyutai-labs/moshi
15
 
16
+ python -m moshi.server [--gradio-tunnel] [--hf-repo DavidBrowne17/Muchi]
17
 
18
  Model Details
19
 
 
27
 
28
  Model Description
29
 
30
+ Muchi is a refined version of the Moshi model, designed for smoother, more adaptable dialogue generation. Building upon Moshi’s speech-to-speech generation foundation, Muchi enhances conversational coherence and reduces latency. Like Moshi, it uses a residual quantizer from a neural audio codec to generate speech tokens and models its own and user speech into parallel streams. This framework supports dynamic conversational flow without rigid speaker turns.
31
 
32
+ Muchi also implements the "Inner Monologue" method, predicting time-aligned text tokens before generating speech tokens. This approach enhances linguistic quality, supports streaming speech recognition, and improves text-to-speech output. Muchi achieves a practical latency of approximately 200ms, ensuring near real-time interaction.
33
 
34
+ Key Enhancements in Muchi:
35
 
36
  Reduced latency and smoother conversational flow.
37
 
 
43
 
44
  Direct Use
45
 
46
+ Muchi can be deployed as a conversational agent for:
47
 
48
  Casual conversation.
49
 
 
61
 
62
  Out-of-Scope Use
63
 
64
+ Muchi is not intended for:
65
 
66
  Impersonating individuals.
67
 
 
71
 
72
  Bias, Risks, and Limitations
73
 
74
+ Muchi inherits safeguards from Moshi but may still exhibit biases due to the nature of its training data. While toxicity has been minimized, there are risks of over-representation from certain data domains. The model is trained to produce a consistent voice and is not designed for impersonation. Further testing is necessary to evaluate long-term sociotechnical impacts.
75
+