Spark-TTS 0.5B Fine-Tuned Model (16-bit Merged)

This repository hosts a fine-tuned Spark-TTS 0.5B model optimized for speech synthesis using the Unsloth and TRL libraries. The model is saved and shared in a merged 16-bit format for efficient storage and faster inference while maintaining high-quality outputs.

Model Details

  • Architecture: Transformer-based Text-to-Speech (Spark-TTS)
  • Model Size: 0.5 Billion parameters
  • Precision: 16-bit merged weights (optimized for inference)
  • Fine-tuning: Full fine-tuning enabled with LoRA adapters (bfloat16 precision)
  • Training Framework: Unsloth & TRL (Supervised Fine-Tuning)
  • Tokenizer: Compatible tokenizer included

Intended Use

This model is intended for research and development in text-to-speech synthesis tasks, especially where GPU memory efficiency and long context handling are priorities.

Usage

from unsloth import FastModel
import torch

# Load the fine-tuned Spark-TTS model and tokenizer from Hugging Face Hub
model, tokenizer = FastModel.from_pretrained(
    "sureshbeekhani/spark-tts-0.5b-finetune-16bit",
    max_seq_length=2048,   # Adjust based on your needs
    dtype=torch.bfloat16,  # Use bfloat16 for LoRA compatibility and efficiency
    full_finetuning=False, # Set to False if you want to use the model for inference only
)

# Example text input for speech synthesis
text = "Hello, welcome to the Spark-TTS fine-tuned model demo!"

# Tokenize the input text
inputs = tokenizer(text, return_tensors="pt")

# Generate speech output from the model
# Note: Adjust this to your model’s specific generate method if applicable
outputs = model.generate(**inputs)

# Process or save outputs as needed (e.g., convert to audio waveform)
# This part depends on your model’s output format and synthesis pipeline

print("Inference completed successfully.")

# Limitations
LoRA fine-tuning is supported only with bfloat16 precision.

Designed primarily for speech synthesis; may not perform well for unrelated NLP tasks.

Usage in production should be tested carefully for latency and quality trade-offs.

#License

This model is licensed under the MIT License.
If you want, I can help generate a README.md file or add badges and additional sections!
Downloads last month
28
Safetensors
Model size
507M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SURESHBEEKHANI/spark-tts-0.5b-finetune-16bit

Finetuned
(14)
this model