metadata

language: en
license: apache-2.0
tags:
  - t5
  - music
  - spotify
  - text2json
  - audio-features
base_model: t5-base
datasets:
  - custom
library_name: transformers
pipeline_tag: text2text-generation

T5-Base Fine-tuned for Spotify Features Prediction

T5-Base fine-tuned to convert natural language prompts into Spotify audio feature JSON

Model Details

Base Model: t5-base
Model Type: Text-to-JSON generation
Language: English
Task: Convert natural language music preferences into Spotify audio feature JSON objects
Fine-tuning Dataset: Custom dataset of prompts to Spotify audio features

Known Issues

IMPORTANT: This model version may have JSON formatting issues where it doesn't generate proper curly braces. The output might need post-processing to create valid JSON.

Usage

from transformers import T5ForConditionalGeneration, T5Tokenizer
import json

# Load model and tokenizer
model = T5ForConditionalGeneration.from_pretrained("synyyy/t5-spotify-features-v2")
tokenizer = T5Tokenizer.from_pretrained("synyyy/t5-spotify-features-v2")

def generate_spotify_features(prompt):
    # Format input
    input_text = f"prompt: {prompt}"
    
    # Tokenize and generate
    input_ids = tokenizer(input_text, return_tensors="pt", max_length=256, truncation=True).input_ids
    outputs = model.generate(
        input_ids, 
        max_length=256, 
        num_beams=4, 
        early_stopping=True,
        do_sample=False
    )
    
    # Decode result
    result = tokenizer.decode(outputs[0], skip_special_tokens=True)
    
    # Post-process if needed (add curly braces if missing)
    if not result.strip().startswith('{') and not result.strip().endswith('}'):
        result = "{" + result + "}"
    
    try:
        return json.loads(result)
    except json.JSONDecodeError as e:
        print(f"JSON parsing failed: {e}")
        print(f"Raw output: {result}")
        return None

# Example usage
prompt = "I want energetic dance music for a party"
features = generate_spotify_features(prompt)
print(features)

Expected Output Format

{
  "danceability": 0.85,
  "energy": 0.90,
  "valence": 0.75,
  "acousticness": 0.15,
  "instrumentalness": 0.05,
  "speechiness": 0.08
}

Training Configuration

Epochs: 7
Learning Rate: 1e-4
Batch Size: 8 (per device)
Gradient Accumulation Steps: 2
Scheduler: Cosine with warmup
Max Length: 256 tokens

Limitations

May generate incomplete JSON that requires post-processing
Performance depends on similarity to training data
Trained on specific prompt format starting with "prompt: "

Model Files

This repository contains:

config.json: Model configuration
pytorch_model.bin: Model weights
tokenizer.json: Tokenizer vocabulary
tokenizer_config.json: Tokenizer configuration
special_tokens_map.json: Special token mappings

Citation

If you use this model, please cite:

@misc{t5-spotify-features-v1,
  author = {afsagag, synyyy},
  title = {T5-Base Fine-tuned for Spotify Features Prediction},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/synyyy/t5-spotify-features-v2}
}