Spark-TTS 0.5B Fine-Tuned Model (16-bit Merged)
This repository hosts a fine-tuned Spark-TTS 0.5B model optimized for speech synthesis using the Unsloth and TRL libraries. The model is saved and shared in a merged 16-bit format for efficient storage and faster inference while maintaining high-quality outputs.
Model Details
- Architecture: Transformer-based Text-to-Speech (Spark-TTS)
- Model Size: 0.5 Billion parameters
- Precision: 16-bit merged weights (optimized for inference)
- Fine-tuning: Full fine-tuning enabled with LoRA adapters (bfloat16 precision)
- Training Framework: Unsloth & TRL (Supervised Fine-Tuning)
- Tokenizer: Compatible tokenizer included
Intended Use
This model is intended for research and development in text-to-speech synthesis tasks, especially where GPU memory efficiency and long context handling are priorities.
Usage
from unsloth import FastModel
import torch
# Load the fine-tuned Spark-TTS model and tokenizer from Hugging Face Hub
model, tokenizer = FastModel.from_pretrained(
"sureshbeekhani/spark-tts-0.5b-finetune-16bit",
max_seq_length=2048, # Adjust based on your needs
dtype=torch.bfloat16, # Use bfloat16 for LoRA compatibility and efficiency
full_finetuning=False, # Set to False if you want to use the model for inference only
)
# Example text input for speech synthesis
text = "Hello, welcome to the Spark-TTS fine-tuned model demo!"
# Tokenize the input text
inputs = tokenizer(text, return_tensors="pt")
# Generate speech output from the model
# Note: Adjust this to your model’s specific generate method if applicable
outputs = model.generate(**inputs)
# Process or save outputs as needed (e.g., convert to audio waveform)
# This part depends on your model’s output format and synthesis pipeline
print("Inference completed successfully.")
# Limitations
LoRA fine-tuning is supported only with bfloat16 precision.
Designed primarily for speech synthesis; may not perform well for unrelated NLP tasks.
Usage in production should be tested carefully for latency and quality trade-offs.
#License
This model is licensed under the MIT License.
If you want, I can help generate a README.md file or add badges and additional sections!
- Downloads last month
- 28
Model tree for SURESHBEEKHANI/spark-tts-0.5b-finetune-16bit
Base model
SparkAudio/Spark-TTS-0.5B