README.md · sesame/csm-1b at 96dc011e8d24cc03b14a9f543bf120616a3f3b6a

metadata

title: Sesame CSM Space
emoji: 🚀
colorFrom: blue
colorTo: blue
sdk: gradio
sdk_version: 5.20.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: Generation using Sesame's Conversational Speech Model

CSM 1B

2025/03/13 - We are releasing the 1B CSM variant. Code is available on GitHub: SesameAILabs/csm. Checkpoint is hosted on HuggingFace.

Try out the interactive demo of our fine-tuned version sesame.com/voicedemo.

Generate from the open-source base model hosted on HuggingFace.

CSM (Conversational Speech Model) is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs. A fine-tuned version of this model powers the interactive demo in our technical blog post.

The model architecture employs a Llama backbone and a smaller audio decoder that produces Mimi audio codes.

Misuse and abuse ⚠️

This project provides a high-quality speech generation model for research and educational purposes. While we encourage responsible and ethical use, we explicitly prohibit the following:

Impersonation or Fraud: Do not use this model to generate speech that mimics real individuals without their explicit consent.
Misinformation or Deception: Do not use this model to create deceptive or misleading content, such as fake news or fraudulent calls.
Illegal or Harmful Activities: Do not use this model for any illegal, harmful, or malicious purposes.

By using this model, you agree to comply with all applicable laws and ethical guidelines. We are not responsible for any misuse, and we strongly condemn unethical applications of this technology.

Prompts Conversational prompts are from the EdAcc dataset Read speech prompts are form the LibriTTS-R dataset

Authors Johan Schalkwyk, Ankit Kumar, Dan Lyth, Sefik Emre Eskimez, Zack Hodari, Cinjon Resnick, Ramon Sanabria, Raven Jiang, and the Sesame team.