metadata
license: agpl-3.0
datasets:
- MrDragonFox/Elise
language:
- en
base_model:
- sesame/csm-1b
pipeline_tag: text-to-speech
library_name: transformers
tags:
- generative-ai
new_version: keanteng/sesame-csm-elise-lora
CSM Elise Voice Model
This model is a fine-tuned version of sesame/csm-1b using the Elise dataset. There are sample outputs files in the repository.
Model Details
- Base Model: sesame/csm-1b
- Training Data: MrDragonFox/Elise dataset
- Fine-tuning Approach: Voice cloning through conditional speech generation
- Voice Characteristics: [Describe voice qualities]
- Training Parameters:
- Learning Rate: 2e-5
- Epochs: 3
- Batch Size: 1 with gradient accumulation steps of 4
Quick Start
from transformers import CsmForConditionalGeneration, AutoProcessor
import torch
import soundfile as sf
# Load the model
model_id = "keanteng/sesame-csm-elise" # Replace with your model
device = "cuda" if torch.cuda.is_available() else "cpu"
processor = AutoProcessor.from_pretrained(model_id)
model = CsmForConditionalGeneration.from_pretrained(model_id, device_map=device)
Basic Text-to-Speech
# Simple text generation
conversation = [
{"role": "0", "content": [{"type": "text", "text": "Hello, this is a test!"}]}
]
inputs = processor.apply_chat_template(
conversation,
tokenize=True,
return_dict=True,
).to(device)
# Generate audio
audio = model.generate(**inputs, output_audio=True)
audio_cpu = audio[0].to(torch.float32).cpu().numpy()
# Save to file
sf.write("output.wav", audio_cpu, 24000)