shb777/ceylia-initial · Hugging Face

⚠️ Initial Checkpoint

This is a Piper TTS model finetuned from Kristin medium

This model is after just 5 epochs on ~30% of total data I curated (synthetic + natural).

Currently, I'm refining the synthetic dataset as I'm not satisfied with its quality. I will resume finetuning after.

Also running ablations on the best ratio of synthetic and natural data.

From initial observations it seems like its better to use majority of one kind (90%-10%).

Trying to push the boundaries of audio generated by a mere 63 MB model.

Inference

import wave

from src.python_run.piper import PiperVoice # Or import from the installed package if you used pip

model = PiperVoice.load("en_US-ceylia-medium.onnx")

text = "I have a big plan for today. It involves fine-tuning you."

with wave.open("output.wav", "wb") as output_file:
    output_file.setnchannels(1)
    output_file.setsampwidth(2)
    output_file.setframerate(22050)
    model.synthesize(text=text, wav_file=output_file, sentence_silence=0.25)

🙏 Acknowledgements

Bryce Beattie for training the Kristin model.

Reference Audio from datasets by @Jinsaryko

Piper TTS

shb777
/

ceylia-initial

⚠️ Initial Checkpoint

Inference

🙏 Acknowledgements

Dataset used to train shb777/ceylia-initial