{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "Qpw04rkbynx0" }, "source": [ "To run this, press \"*Runtime*\" and press \"*Run all*\" on a **free** Tesla T4 Google Colab instance!\n", "
\n", "\n", "To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://docs.unsloth.ai/get-started/installing-+-updating).\n", "\n", "You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save)\n" ] }, { "cell_type": "markdown", "metadata": { "id": "5fs-yYEaynx1" }, "source": [ "### News" ] }, { "cell_type": "markdown", "metadata": { "id": "pyJK0UZaynx2" }, "source": [ "Unsloth now supports Text-to-Speech (TTS) models. Read our [guide here](https://docs.unsloth.ai/basics/text-to-speech-tts-fine-tuning).\n", "\n", "Read our **[Gemma 3N Guide](https://docs.unsloth.ai/basics/gemma-3n-how-to-run-and-fine-tune)** and check out our new **[Dynamic 2.0](https://docs.unsloth.ai/basics/unsloth-dynamic-2.0-ggufs)** quants which outperforms other quantization methods!\n", "\n", "Visit our docs for all our [model uploads](https://docs.unsloth.ai/get-started/all-our-models) and [notebooks](https://docs.unsloth.ai/get-started/unsloth-notebooks).\n" ] }, { "cell_type": "markdown", "metadata": { "id": "SDUHv0mwynx3" }, "source": [ "### Installation" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "MY4G3EIbynx3" }, "outputs": [], "source": [ "%%capture\n", "import os\n", "if \"COLAB_\" not in \"\".join(os.environ.keys()):\n", " %pip install unsloth\n", "else:\n", " # Do this only in Colab notebooks! Otherwise use pip install unsloth\n", " %pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl triton cut_cross_entropy unsloth_zoo\n", " %pip install sentencepiece protobuf \"datasets>=3.4.1,<4.0.0\" \"huggingface_hub>=0.34.0\" hf_transfer\n", " %pip install --no-deps unsloth\n", "%git clone https://github.com/SparkAudio/Spark-TTS\n", "%pip install omegaconf einx" ] }, { "cell_type": "markdown", "metadata": { "id": "AkWYsztAs9Ky" }, "source": [ "### Unsloth\n", "\n", "`FastModel` supports loading nearly any model now! This includes Vision and Text models!\n", "\n", "Thank you to [Etherl](https://huggingface.co/Etherll) for creating this notebook!" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2025-03-22T00:48:54.511089Z", "iopub.status.busy": "2025-03-22T00:48:54.510770Z", "iopub.status.idle": "2025-03-22T00:51:37.363415Z", "shell.execute_reply": "2025-03-22T00:51:37.362696Z", "shell.execute_reply.started": "2025-03-22T00:48:54.511053Z" }, "id": "QmUBVEnvCDJv", "outputId": "42083a68-d3cc-48c9-d852-b60796377434" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "π¦₯ Unsloth: Will patch your computer to enable 2x faster free finetuning.\n", "π¦₯ Unsloth Zoo will now patch everything to make training faster!\n", "==((====))== Unsloth 2025.8.1: Fast Qwen2 patching. Transformers: 4.54.1.\n", " \\\\ /| Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.\n", "O^O/ \\_/ \\ Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0\n", "\\ / Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]\n", " \"-____-\" Free license: http://github.com/unslothai/unsloth\n", "Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!\n", "Unsloth: Float16 full finetuning uses more memory since we upcast weights to float32.\n" ] } ], "source": [ "from unsloth import FastModel\n", "import torch\n", "from huggingface_hub import snapshot_download\n", "\n", "max_seq_length = 2048 # Choose any for long context!\n", "\n", "fourbit_models = [\n", " # 4bit dynamic quants for superior accuracy and low memory use\n", " \"unsloth/gemma-3-4b-it-unsloth-bnb-4bit\",\n", " \"unsloth/gemma-3-12b-it-unsloth-bnb-4bit\",\n", " \"unsloth/gemma-3-27b-it-unsloth-bnb-4bit\",\n", " # Qwen3 new models\n", " \"unsloth/Qwen3-4B-unsloth-bnb-4bit\",\n", " \"unsloth/Qwen3-8B-unsloth-bnb-4bit\",\n", " # Other very popular models!\n", " \"unsloth/Llama-3.1-8B\",\n", " \"unsloth/Llama-3.2-3B\",\n", " \"unsloth/Llama-3.3-70B\",\n", " \"unsloth/mistral-7b-instruct-v0.3\",\n", " \"unsloth/Phi-4\",\n", "] # More models at https://huggingface.co/unsloth\n", "\n", "# Download model and code\n", "snapshot_download(\"unsloth/Spark-TTS-0.5B\", local_dir = \"Spark-TTS-0.5B\")\n", "\n", "model, tokenizer = FastModel.from_pretrained(\n", " model_name = f\"Spark-TTS-0.5B/LLM\",\n", " max_seq_length = max_seq_length,\n", " dtype = torch.float32, # Spark seems to only work on float32 for now\n", " full_finetuning = True, # We support full finetuning now!\n", " load_in_4bit = False,\n", " #token = \"hf_...\", # use one if using gated models like meta-llama/Llama-2-7b-hf\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "SXd9bTZd1aaL" }, "source": [ "We now add LoRA adapters so we only need to update 1 to 10% of all parameters!" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2025-03-22T00:51:37.365079Z", "iopub.status.busy": "2025-03-22T00:51:37.364731Z", "iopub.status.idle": "2025-03-22T00:51:44.221612Z", "shell.execute_reply": "2025-03-22T00:51:44.220949Z", "shell.execute_reply.started": "2025-03-22T00:51:37.365045Z" }, "id": "6bZsfBuZDeCL", "outputId": "292447b8-fd80-4b8b-ba3f-4637a1045166" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Unsloth: Full finetuning is enabled, so .get_peft_model has no effect\n" ] } ], "source": [ "#LoRA does not work with float32 only works with bfloat16 !!!\n", "model = FastModel.get_peft_model(\n", " model,\n", " r = 128, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128\n", " target_modules = [\"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\",\n", " \"gate_proj\", \"up_proj\", \"down_proj\",],\n", " lora_alpha = 128,\n", " lora_dropout = 0, # Supports any, but = 0 is optimized\n", " bias = \"none\", # Supports any, but = \"none\" is optimized\n", " # [NEW] \"unsloth\" uses 30% less VRAM, fits 2x larger batch sizes!\n", " use_gradient_checkpointing = \"unsloth\", # True or \"unsloth\" for very long context\n", " random_state = 3407,\n", " use_rslora = False, # We support rank stabilized LoRA\n", " loftq_config = None, # And LoftQ\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "vITh0KVJ10qX" }, "source": [ "\n", "### Data Prep \n", "\n", "We will use the `MrDragonFox/Elise`, which is designed for training TTS models. Ensure that your dataset follows the required format: **text, audio** for single-speaker models or **source, text, audio** for multi-speaker models. You can modify this section to accommodate your own dataset, but maintaining the correct structure is essential for optimal training." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2025-03-22T00:51:44.222880Z", "iopub.status.busy": "2025-03-22T00:51:44.222617Z", "iopub.status.idle": "2025-03-22T00:52:16.516878Z", "shell.execute_reply": "2025-03-22T00:52:16.516033Z", "shell.execute_reply.started": "2025-03-22T00:51:44.222848Z" }, "id": "LjY75GoYUCB8" }, "outputs": [], "source": [ "from datasets import load_dataset\n", "dataset = load_dataset(\"Balaji-1904/TTS_KN_DS_V1.1\", split = \"train\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 173, "referenced_widgets": [ "a3b0c0581f1f4c428baaadd8e9a39b6f", "2315228ff2b141afabe1263471f5364b", "0474debc340943bd85f3daf92aebf7aa", "cff1b0fa2ea24f45aab26685353eefdd", "b7e20be79df246f19b35114a690e44f0", "426eb100a94642f79e6b99777406a265", "a36b5cf197dd4bd9a7f70aa6671b804c", "0de4d0f282404edfbc191dca73f15f35", "e58b5ad2f781475d8af2ddb38009baa6", "33fbacbb2aa146cd90586357eec1dc3e", "930b4d1d5f4b494b830df4d4c398e67c" ] }, "execution": { "iopub.execute_input": "2025-03-22T00:52:16.518175Z", "iopub.status.busy": "2025-03-22T00:52:16.517841Z", "iopub.status.idle": "2025-03-22T00:52:35.039329Z", "shell.execute_reply": "2025-03-22T00:52:35.038356Z", "shell.execute_reply.started": "2025-03-22T00:52:16.518146Z" }, "id": "zK94B-Pfioto", "outputId": "3f11cf35-c173-410d-f709-43552323f26f" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/usr/local/lib/python3.11/dist-packages/torch/nn/utils/weight_norm.py:143: FutureWarning: `torch.nn.utils.weight_norm` is deprecated in favor of `torch.nn.utils.parametrizations.weight_norm`.\n", " WeightNorm.apply(module, name, dim)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Missing tensor: mel_transformer.spectrogram.window\n", "Missing tensor: mel_transformer.mel_scale.fb\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Parameter 'function'=