{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "Qpw04rkbynx0" }, "source": [ "To run this, press \"*Runtime*\" and press \"*Run all*\" on a **free** Tesla T4 Google Colab instance!\n", "
\n", "\n", "\n", " Join Discord if you need help + ⭐ Star us on Github ⭐\n", "
\n", "\n", "To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://docs.unsloth.ai/get-started/installing-+-updating).\n", "\n", "You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save)\n" ] }, { "cell_type": "markdown", "metadata": { "id": "5fs-yYEaynx1" }, "source": [ "### News" ] }, { "cell_type": "markdown", "metadata": { "id": "pyJK0UZaynx2" }, "source": [ "Unsloth now supports Text-to-Speech (TTS) models. Read our [guide here](https://docs.unsloth.ai/basics/text-to-speech-tts-fine-tuning).\n", "\n", "Read our **[Gemma 3N Guide](https://docs.unsloth.ai/basics/gemma-3n-how-to-run-and-fine-tune)** and check out our new **[Dynamic 2.0](https://docs.unsloth.ai/basics/unsloth-dynamic-2.0-ggufs)** quants which outperforms other quantization methods!\n", "\n", "Visit our docs for all our [model uploads](https://docs.unsloth.ai/get-started/all-our-models) and [notebooks](https://docs.unsloth.ai/get-started/unsloth-notebooks).\n" ] }, { "cell_type": "markdown", "metadata": { "id": "SDUHv0mwynx3" }, "source": [ "### Installation" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "MY4G3EIbynx3" }, "outputs": [], "source": [ "%%capture\n", "import os\n", "if \"COLAB_\" not in \"\".join(os.environ.keys()):\n", " %pip install unsloth\n", "else:\n", " # Do this only in Colab notebooks! Otherwise use pip install unsloth\n", " %pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl triton cut_cross_entropy unsloth_zoo\n", " %pip install sentencepiece protobuf \"datasets>=3.4.1,<4.0.0\" \"huggingface_hub>=0.34.0\" hf_transfer\n", " %pip install --no-deps unsloth\n", "%git clone https://github.com/SparkAudio/Spark-TTS\n", "%pip install omegaconf einx" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "QmUBVEnvCDJv", "outputId": "42083a68-d3cc-48c9-d852-b60796377434" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "πŸ¦₯ Unsloth: Will patch your computer to enable 2x faster free finetuning.\n", "πŸ¦₯ Unsloth Zoo will now patch everything to make training faster!\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "9ad0d25a6f8549d1ac79addbe171b758", "version_major": 2, "version_minor": 0 }, "text/plain": [ ".gitattributes: 0.00B [00:00, ?B/s]" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "7e83dd9464b64a6d963c349d1660a28c", "version_major": 2, "version_minor": 0 }, "text/plain": [ "config.yaml: 0.00B [00:00, ?B/s]" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "332e86b12a4c45a89a95f1f265ca0f12", "version_major": 2, "version_minor": 0 }, "text/plain": [ "BiCodec/model.safetensors: 0%| | 0.00/626M [00:00 0 ! Suggested 8, 16, 32, 64, 128\n", " target_modules = [\"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\",\n", " \"gate_proj\", \"up_proj\", \"down_proj\",],\n", " lora_alpha = 128,\n", " lora_dropout = 0, # Supports any, but = 0 is optimized\n", " bias = \"none\", # Supports any, but = \"none\" is optimized\n", " # [NEW] \"unsloth\" uses 30% less VRAM, fits 2x larger batch sizes!\n", " use_gradient_checkpointing = \"unsloth\", # True or \"unsloth\" for very long context\n", " random_state = 3407,\n", " use_rslora = False, # We support rank stabilized LoRA\n", " loftq_config = None, # And LoftQ\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "vITh0KVJ10qX" }, "source": [ "\n", "### Data Prep \n", "\n", "We will use the `Balaji-1904/TTS_KN_DS_V1.1`, which is designed for training TTS models. Ensure that your dataset follows the required format: **text, audio** for single-speaker models or **source, text, audio** for multi-speaker models. You can modify this section to accommodate your own dataset, but maintaining the correct structure is essential for optimal training." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "id": "LjY75GoYUCB8" }, "outputs": [], "source": [ "from datasets import load_dataset\n", "dataset = load_dataset(\"Balaji-1904/TTS_KN_DS_V1.1\", split = \"train\")" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 173, "referenced_widgets": [ "a3b0c0581f1f4c428baaadd8e9a39b6f", "2315228ff2b141afabe1263471f5364b", "0474debc340943bd85f3daf92aebf7aa", "cff1b0fa2ea24f45aab26685353eefdd", "b7e20be79df246f19b35114a690e44f0", "426eb100a94642f79e6b99777406a265", "a36b5cf197dd4bd9a7f70aa6671b804c", "0de4d0f282404edfbc191dca73f15f35", "e58b5ad2f781475d8af2ddb38009baa6", "33fbacbb2aa146cd90586357eec1dc3e", "930b4d1d5f4b494b830df4d4c398e67c" ] }, "id": "zK94B-Pfioto", "outputId": "3f11cf35-c173-410d-f709-43552323f26f" }, "outputs": [ { "ename": "ModuleNotFoundError", "evalue": "No module named 'torchaudio'", "output_type": "error", "traceback": [ "\u001b[31m---------------------------------------------------------------------------\u001b[39m", "\u001b[31mModuleNotFoundError\u001b[39m Traceback (most recent call last)", "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[5]\u001b[39m\u001b[32m, line 4\u001b[39m\n\u001b[32m 1\u001b[39m \u001b[38;5;66;03m#@title Tokenization Function\u001b[39;00m\n\u001b[32m 3\u001b[39m \u001b[38;5;28;01mimport\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mlocale\u001b[39;00m\n\u001b[32m----> \u001b[39m\u001b[32m4\u001b[39m \u001b[38;5;28;01mimport\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mtorchaudio\u001b[39;00m\u001b[34;01m.\u001b[39;00m\u001b[34;01mtransforms\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mas\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mT\u001b[39;00m\n\u001b[32m 5\u001b[39m \u001b[38;5;28;01mimport\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mos\u001b[39;00m\n\u001b[32m 6\u001b[39m \u001b[38;5;28;01mimport\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mtorch\u001b[39;00m\n", "\u001b[31mModuleNotFoundError\u001b[39m: No module named 'torchaudio'" ] } ], "source": [ "#@title Tokenization Function\n", "\n", "import locale\n", "import torchaudio.transforms as T\n", "import os\n", "import torch\n", "import sys\n", "import numpy as np\n", "sys.path.append('Spark-TTS')\n", "from sparktts.models.audio_tokenizer import BiCodecTokenizer\n", "from sparktts.utils.audio import audio_volume_normalize\n", "\n", "audio_tokenizer = BiCodecTokenizer(\"Spark-TTS-0.5B\", \"cuda\")\n", "def extract_wav2vec2_features( wavs: torch.Tensor) -> torch.Tensor:\n", " \"\"\"extract wav2vec2 features\"\"\"\n", "\n", " if wavs.shape[0] != 1:\n", "\n", " raise ValueError(f\"Expected batch size 1, but got shape {wavs.shape}\")\n", " wav_np = wavs.squeeze(0).cpu().numpy()\n", "\n", " processed = audio_tokenizer.processor(\n", " wav_np,\n", " sampling_rate=16000,\n", " return_tensors=\"pt\",\n", " padding=True,\n", " )\n", " input_values = processed.input_values\n", "\n", " input_values = input_values.to(audio_tokenizer.feature_extractor.device)\n", "\n", " model_output = audio_tokenizer.feature_extractor(\n", " input_values,\n", " )\n", "\n", "\n", " if model_output.hidden_states is None:\n", " raise ValueError(\"Wav2Vec2Model did not return hidden states. Ensure config `output_hidden_states=True`.\")\n", "\n", " num_layers = len(model_output.hidden_states)\n", " required_layers = [11, 14, 16]\n", " if any(l >= num_layers for l in required_layers):\n", " raise IndexError(f\"Requested hidden state indices {required_layers} out of range for model with {num_layers} layers.\")\n", "\n", " feats_mix = (\n", " model_output.hidden_states[11] + model_output.hidden_states[14] + model_output.hidden_states[16]\n", " ) / 3\n", "\n", " return feats_mix\n", "def formatting_audio_func(example):\n", " text = f\"{example['source']}: {example['text']}\" if \"source\" in example else example[\"text\"]\n", " audio_array = example[\"audio\"][\"array\"]\n", " sampling_rate = example[\"audio\"][\"sampling_rate\"]\n", "\n", " target_sr = audio_tokenizer.config['sample_rate']\n", "\n", " if sampling_rate != target_sr:\n", " resampler = T.Resample(orig_freq=sampling_rate, new_freq=target_sr)\n", " audio_tensor_temp = torch.from_numpy(audio_array).float()\n", " audio_array = resampler(audio_tensor_temp).numpy()\n", "\n", " if audio_tokenizer.config[\"volume_normalize\"]:\n", " audio_array = audio_volume_normalize(audio_array)\n", "\n", " ref_wav_np = audio_tokenizer.get_ref_clip(audio_array)\n", "\n", " audio_tensor = torch.from_numpy(audio_array).unsqueeze(0).float().to(audio_tokenizer.device)\n", " ref_wav_tensor = torch.from_numpy(ref_wav_np).unsqueeze(0).float().to(audio_tokenizer.device)\n", "\n", "\n", " feat = extract_wav2vec2_features(audio_tensor)\n", "\n", " batch = {\n", "\n", " \"wav\": audio_tensor,\n", " \"ref_wav\": ref_wav_tensor,\n", " \"feat\": feat.to(audio_tokenizer.device),\n", " }\n", "\n", "\n", " semantic_token_ids, global_token_ids = audio_tokenizer.model.tokenize(batch)\n", "\n", " global_tokens = \"\".join(\n", " [f\"<|bicodec_global_{i}|>\" for i in global_token_ids.squeeze().cpu().numpy()] # Squeeze batch dim\n", " )\n", " semantic_tokens = \"\".join(\n", " [f\"<|bicodec_semantic_{i}|>\" for i in semantic_token_ids.squeeze().cpu().numpy()] # Squeeze batch dim\n", " )\n", "\n", " inputs = [\n", " \"<|task_tts|>\",\n", " \"<|start_content|>\",\n", " text,\n", " \"<|end_content|>\",\n", " \"<|start_global_token|>\",\n", " global_tokens,\n", " \"<|end_global_token|>\",\n", " \"<|start_semantic_token|>\",\n", " semantic_tokens,\n", " \"<|end_semantic_token|>\",\n", " \"<|im_end|>\"\n", " ]\n", " inputs = \"\".join(inputs)\n", " return {\"text\": inputs}\n", "\n", "\n", "dataset = dataset.map(formatting_audio_func, remove_columns=[\"audio\"])\n", "print(\"Moving Bicodec model and Wav2Vec2Model to cpu.\")\n", "audio_tokenizer.model.cpu()\n", "audio_tokenizer.feature_extractor.cpu()\n", "torch.cuda.empty_cache()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Collecting torchaudio\n", " Downloading torchaudio-2.8.0-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (7.2 kB)\n", "Collecting torch==2.8.0 (from torchaudio)\n", " Using cached torch-2.8.0-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (30 kB)\n", "Requirement already satisfied: filelock in /datadrive/jupyter/devbase/Balaji/TTS_ft/lib/python3.12/site-packages (from torch==2.8.0->torchaudio) (3.18.0)\n", "Requirement already satisfied: typing-extensions>=4.10.0 in /datadrive/jupyter/devbase/Balaji/TTS_ft/lib/python3.12/site-packages (from torch==2.8.0->torchaudio) (4.14.1)\n", "Requirement already satisfied: setuptools in /datadrive/jupyter/devbase/Balaji/TTS_ft/lib/python3.12/site-packages (from torch==2.8.0->torchaudio) (80.9.0)\n", "Requirement already satisfied: sympy>=1.13.3 in /datadrive/jupyter/devbase/Balaji/TTS_ft/lib/python3.12/site-packages (from torch==2.8.0->torchaudio) (1.14.0)\n", "Requirement already satisfied: networkx in /datadrive/jupyter/devbase/Balaji/TTS_ft/lib/python3.12/site-packages (from torch==2.8.0->torchaudio) (3.5)\n", "Requirement already satisfied: jinja2 in /datadrive/jupyter/devbase/Balaji/TTS_ft/lib/python3.12/site-packages (from torch==2.8.0->torchaudio) (3.1.6)\n", "Requirement already satisfied: fsspec in /datadrive/jupyter/devbase/Balaji/TTS_ft/lib/python3.12/site-packages (from torch==2.8.0->torchaudio) (2025.3.0)\n", "Collecting nvidia-cuda-nvrtc-cu12==12.8.93 (from torch==2.8.0->torchaudio)\n", " Using cached nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.7 kB)\n", "Collecting nvidia-cuda-runtime-cu12==12.8.90 (from torch==2.8.0->torchaudio)\n", " Using cached nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB)\n", "Collecting nvidia-cuda-cupti-cu12==12.8.90 (from torch==2.8.0->torchaudio)\n", " Using cached nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB)\n", "Collecting nvidia-cudnn-cu12==9.10.2.21 (from torch==2.8.0->torchaudio)\n", " Using cached nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl.metadata (1.8 kB)\n", "Collecting nvidia-cublas-cu12==12.8.4.1 (from torch==2.8.0->torchaudio)\n", " Using cached nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl.metadata (1.7 kB)\n", "Collecting nvidia-cufft-cu12==11.3.3.83 (from torch==2.8.0->torchaudio)\n", " Using cached nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB)\n", "Collecting nvidia-curand-cu12==10.3.9.90 (from torch==2.8.0->torchaudio)\n", " Using cached nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl.metadata (1.7 kB)\n", "Collecting nvidia-cusolver-cu12==11.7.3.90 (from torch==2.8.0->torchaudio)\n", " Using cached nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl.metadata (1.8 kB)\n", "Collecting nvidia-cusparse-cu12==12.5.8.93 (from torch==2.8.0->torchaudio)\n", " Using cached nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.8 kB)\n", "Collecting nvidia-cusparselt-cu12==0.7.1 (from torch==2.8.0->torchaudio)\n", " Using cached nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl.metadata (7.0 kB)\n", "Collecting nvidia-nccl-cu12==2.27.3 (from torch==2.8.0->torchaudio)\n", " Using cached nvidia_nccl_cu12-2.27.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.0 kB)\n", "Collecting nvidia-nvtx-cu12==12.8.90 (from torch==2.8.0->torchaudio)\n", " Using cached nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.8 kB)\n", "Collecting nvidia-nvjitlink-cu12==12.8.93 (from torch==2.8.0->torchaudio)\n", " Using cached nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.7 kB)\n", "Collecting nvidia-cufile-cu12==1.13.1.3 (from torch==2.8.0->torchaudio)\n", " Using cached nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB)\n", "Collecting triton==3.4.0 (from torch==2.8.0->torchaudio)\n", " Using cached triton-3.4.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (1.7 kB)\n", "Requirement already satisfied: mpmath<1.4,>=1.1.0 in /datadrive/jupyter/devbase/Balaji/TTS_ft/lib/python3.12/site-packages (from sympy>=1.13.3->torch==2.8.0->torchaudio) (1.3.0)\n", "Requirement already satisfied: MarkupSafe>=2.0 in /datadrive/jupyter/devbase/Balaji/TTS_ft/lib/python3.12/site-packages (from jinja2->torch==2.8.0->torchaudio) (3.0.2)\n", "Downloading torchaudio-2.8.0-cp312-cp312-manylinux_2_28_x86_64.whl (4.0 MB)\n", "\u001b[2K \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m4.0/4.0 MB\u001b[0m \u001b[31m1.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0mm eta \u001b[36m0:00:01\u001b[0m[36m0:00:01\u001b[0m\n", "\u001b[?25hDownloading torch-2.8.0-cp312-cp312-manylinux_2_28_x86_64.whl (887.9 MB)\n", "\u001b[2K \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m887.9/887.9 MB\u001b[0m \u001b[31m979.7 kB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m eta \u001b[36m0:00:01\u001b[0m[36m0:00:19\u001b[0mm\n", "\u001b[?25hDownloading nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl (594.3 MB)\n", "\u001b[2K \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m594.3/594.3 MB\u001b[0m \u001b[31m1.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0mm eta \u001b[36m0:00:01\u001b[0m[36m0:00:13\u001b[0m\n", "\u001b[?25hDownloading nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (10.2 MB)\n", "\u001b[2K \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m10.2/10.2 MB\u001b[0m \u001b[31m1.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0mm eta \u001b[36m0:00:01\u001b[0m[36m0:00:01\u001b[0m\n", "\u001b[?25hDownloading nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (88.0 MB)\n", "\u001b[2K \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m88.0/88.0 MB\u001b[0m \u001b[31m1.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0mm eta \u001b[36m0:00:01\u001b[0m[36m0:00:02\u001b[0m\n", "\u001b[?25hDownloading nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (954 kB)\n", "\u001b[2K \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m954.8/954.8 kB\u001b[0m \u001b[31m1.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m[36m0:00:01\u001b[0m[36m0:00:01\u001b[0m:01\u001b[0m\n", "\u001b[?25hDownloading nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl (706.8 MB)\n", "\u001b[2K \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m706.8/706.8 MB\u001b[0m \u001b[31m1.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0mm eta \u001b[36m0:00:01\u001b[0m[36m0:00:15\u001b[0mm\n", "\u001b[?25hDownloading nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (193.1 MB)\n", "\u001b[2K \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m193.1/193.1 MB\u001b[0m \u001b[31m1.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0mm eta \u001b[36m0:00:01\u001b[0m[36m0:00:05\u001b[0m\n", "\u001b[?25hDownloading nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (1.2 MB)\n", "\u001b[2K \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.2/1.2 MB\u001b[0m \u001b[31m1.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m[31m1.2 MB/s\u001b[0m eta \u001b[36m0:00:01\u001b[0m\n", "\u001b[?25hDownloading nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl (63.6 MB)\n", "\u001b[2K \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m63.6/63.6 MB\u001b[0m \u001b[31m1.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0mm eta \u001b[36m0:00:01\u001b[0m[36m0:00:02\u001b[0m\n", "\u001b[?25hDownloading nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl (267.5 MB)\n", "\u001b[2K \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m267.5/267.5 MB\u001b[0m \u001b[31m1.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0mm eta \u001b[36m0:00:01\u001b[0m[36m0:00:06\u001b[0m\n", "\u001b[?25hDownloading nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (288.2 MB)\n", "\u001b[2K \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m288.2/288.2 MB\u001b[0m \u001b[31m1.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0mm eta \u001b[36m0:00:01\u001b[0m[36m0:00:07\u001b[0m\n", "\u001b[?25hDownloading nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl (287.2 MB)\n", "\u001b[2K \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m287.2/287.2 MB\u001b[0m \u001b[31m1.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0mm eta \u001b[36m0:00:01\u001b[0m[36m0:00:06\u001b[0m\n", "\u001b[?25hDownloading nvidia_nccl_cu12-2.27.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (322.4 MB)\n", "\u001b[2K \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m322.4/322.4 MB\u001b[0m \u001b[31m1.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0mm eta \u001b[36m0:00:01\u001b[0m[36m0:00:07\u001b[0m\n", "\u001b[?25hDownloading nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (39.3 MB)\n", "\u001b[2K \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m39.3/39.3 MB\u001b[0m \u001b[31m1.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0mm eta \u001b[36m0:00:01\u001b[0m[36m0:00:01\u001b[0m\n", "\u001b[?25hDownloading nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (89 kB)\n", "\u001b[2K \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m90.0/90.0 kB\u001b[0m \u001b[31m1.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m MB/s\u001b[0m eta \u001b[36m0:00:01\u001b[0m:01\u001b[0m\n", "\u001b[?25hDownloading triton-3.4.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (155.6 MB)\n", "\u001b[2K \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m155.6/155.6 MB\u001b[0m \u001b[31m1.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0mm eta \u001b[36m0:00:01\u001b[0m[36m0:00:04\u001b[0m\n", "\u001b[?25hInstalling collected packages: nvidia-cusparselt-cu12, triton, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufile-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, nvidia-cusparse-cu12, nvidia-cufft-cu12, nvidia-cudnn-cu12, nvidia-cusolver-cu12, torch, torchaudio\n", " Attempting uninstall: nvidia-cusparselt-cu12\n", " Found existing installation: nvidia-cusparselt-cu12 0.6.3\n", " Uninstalling nvidia-cusparselt-cu12-0.6.3:\n", " Successfully uninstalled nvidia-cusparselt-cu12-0.6.3\n", " Attempting uninstall: triton\n", " Found existing installation: triton 3.3.1\n", " Uninstalling triton-3.3.1:\n", " Successfully uninstalled triton-3.3.1\n", " Attempting uninstall: nvidia-nvtx-cu12\n", " Found existing installation: nvidia-nvtx-cu12 12.6.77\n", " Uninstalling nvidia-nvtx-cu12-12.6.77:\n", " Successfully uninstalled nvidia-nvtx-cu12-12.6.77\n", " Attempting uninstall: nvidia-nvjitlink-cu12\n", " Found existing installation: nvidia-nvjitlink-cu12 12.6.85\n", " Uninstalling nvidia-nvjitlink-cu12-12.6.85:\n", " Successfully uninstalled nvidia-nvjitlink-cu12-12.6.85\n", " Attempting uninstall: nvidia-nccl-cu12\n", " Found existing installation: nvidia-nccl-cu12 2.26.2\n", " Uninstalling nvidia-nccl-cu12-2.26.2:\n", " Successfully uninstalled nvidia-nccl-cu12-2.26.2\n", " Attempting uninstall: nvidia-curand-cu12\n", " Found existing installation: nvidia-curand-cu12 10.3.7.77\n", " Uninstalling nvidia-curand-cu12-10.3.7.77:\n", " Successfully uninstalled nvidia-curand-cu12-10.3.7.77\n", " Attempting uninstall: nvidia-cufile-cu12\n", " Found existing installation: nvidia-cufile-cu12 1.11.1.6\n", " Uninstalling nvidia-cufile-cu12-1.11.1.6:\n", " Successfully uninstalled nvidia-cufile-cu12-1.11.1.6\n", " Attempting uninstall: nvidia-cuda-runtime-cu12\n", " Found existing installation: nvidia-cuda-runtime-cu12 12.6.77\n", " Uninstalling nvidia-cuda-runtime-cu12-12.6.77:\n", " Successfully uninstalled nvidia-cuda-runtime-cu12-12.6.77\n", " Attempting uninstall: nvidia-cuda-nvrtc-cu12\n", " Found existing installation: nvidia-cuda-nvrtc-cu12 12.6.77\n", " Uninstalling nvidia-cuda-nvrtc-cu12-12.6.77:\n", " Successfully uninstalled nvidia-cuda-nvrtc-cu12-12.6.77\n", " Attempting uninstall: nvidia-cuda-cupti-cu12\n", " Found existing installation: nvidia-cuda-cupti-cu12 12.6.80\n", " Uninstalling nvidia-cuda-cupti-cu12-12.6.80:\n", " Successfully uninstalled nvidia-cuda-cupti-cu12-12.6.80\n", " Attempting uninstall: nvidia-cublas-cu12\n", " Found existing installation: nvidia-cublas-cu12 12.6.4.1\n", " Uninstalling nvidia-cublas-cu12-12.6.4.1:\n", " Successfully uninstalled nvidia-cublas-cu12-12.6.4.1\n", " Attempting uninstall: nvidia-cusparse-cu12\n", " Found existing installation: nvidia-cusparse-cu12 12.5.4.2\n", " Uninstalling nvidia-cusparse-cu12-12.5.4.2:\n", " Successfully uninstalled nvidia-cusparse-cu12-12.5.4.2\n", " Attempting uninstall: nvidia-cufft-cu12\n", " Found existing installation: nvidia-cufft-cu12 11.3.0.4\n", " Uninstalling nvidia-cufft-cu12-11.3.0.4:\n", " Successfully uninstalled nvidia-cufft-cu12-11.3.0.4\n", " Attempting uninstall: nvidia-cudnn-cu12\n", " Found existing installation: nvidia-cudnn-cu12 9.5.1.17\n", " Uninstalling nvidia-cudnn-cu12-9.5.1.17:\n", " Successfully uninstalled nvidia-cudnn-cu12-9.5.1.17\n", " Attempting uninstall: nvidia-cusolver-cu12\n", " Found existing installation: nvidia-cusolver-cu12 11.7.1.2\n", " Uninstalling nvidia-cusolver-cu12-11.7.1.2:\n", " Successfully uninstalled nvidia-cusolver-cu12-11.7.1.2\n", " Attempting uninstall: torch\n", " Found existing installation: torch 2.7.1\n", " Uninstalling torch-2.7.1:\n", " Successfully uninstalled torch-2.7.1\n", "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", "xformers 0.0.31.post1 requires torch==2.7.1, but you have torch 2.8.0 which is incompatible.\n", "torchvision 0.22.1 requires torch==2.7.1, but you have torch 2.8.0 which is incompatible.\u001b[0m\u001b[31m\n", "\u001b[0mSuccessfully installed nvidia-cublas-cu12-12.8.4.1 nvidia-cuda-cupti-cu12-12.8.90 nvidia-cuda-nvrtc-cu12-12.8.93 nvidia-cuda-runtime-cu12-12.8.90 nvidia-cudnn-cu12-9.10.2.21 nvidia-cufft-cu12-11.3.3.83 nvidia-cufile-cu12-1.13.1.3 nvidia-curand-cu12-10.3.9.90 nvidia-cusolver-cu12-11.7.3.90 nvidia-cusparse-cu12-12.5.8.93 nvidia-cusparselt-cu12-0.7.1 nvidia-nccl-cu12-2.27.3 nvidia-nvjitlink-cu12-12.8.93 nvidia-nvtx-cu12-12.8.90 torch-2.8.0 torchaudio-2.8.0 triton-3.4.0\n", "Note: you may need to restart the kernel to use updated packages.\n" ] } ], "source": [ "%pip install torchaudio" ] }, { "cell_type": "markdown", "metadata": { "id": "idAEIeSQ3xdS" }, "source": [ "\n", "### Train the model\n", "Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support TRL's `DPOTrainer`!" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "95_Nn-89DhsL" }, "outputs": [], "source": [ "from trl import SFTConfig, SFTTrainer\n", "trainer = SFTTrainer(\n", " model = model,\n", " tokenizer = tokenizer,\n", " train_dataset = dataset,\n", " dataset_text_field = \"text\",\n", " max_seq_length = max_seq_length,\n", " packing = False, # Can make training 5x faster for short sequences.\n", " args = SFTConfig(\n", " per_device_train_batch_size = 2,\n", " gradient_accumulation_steps = 4,\n", " warmup_steps = 5,\n", " num_train_epochs = 5, # Set this for 1 full training run.\n", " #max_steps = 60,\n", " learning_rate = 1e-5,\n", " fp16 = False, # We're doing full float32 s disable mixed precision\n", " bf16 = False, # We're doing full float32 s disable mixed precision\n", " logging_steps = 1,\n", " optim = \"adamw_8bit\",\n", " weight_decay = 0.01,\n", " lr_scheduler_type = \"linear\",\n", " seed = 3407,\n", " output_dir = \"outputs\",\n", " report_to = \"tensorboard\", # Use this for WandB etc\n", " ),\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "2ejIt2xSNKKp" }, "outputs": [], "source": [ "# @title Show current memory stats\n", "gpu_stats = torch.cuda.get_device_properties(0)\n", "start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)\n", "max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)\n", "print(f\"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.\")\n", "print(f\"{start_gpu_memory} GB of memory reserved.\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "yqxqAZ7KJ4oL" }, "outputs": [], "source": [ "trainer_stats = trainer.train()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "id": "pCqnaKmlO1U9" }, "outputs": [], "source": [ "# @title Show final memory and time stats\n", "used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)\n", "used_memory_for_lora = round(used_memory - start_gpu_memory, 3)\n", "used_percentage = round(used_memory / max_memory * 100, 3)\n", "lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)\n", "print(f\"{trainer_stats.metrics['train_runtime']} seconds used for training.\")\n", "print(\n", " f\"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.\"\n", ")\n", "print(f\"Peak reserved memory = {used_memory} GB.\")\n", "print(f\"Peak reserved memory for training = {used_memory_for_lora} GB.\")\n", "print(f\"Peak reserved memory % of max memory = {used_percentage} %.\")\n", "print(f\"Peak reserved memory for training % of max memory = {lora_percentage} %.\")" ] }, { "cell_type": "markdown", "metadata": { "id": "ekOmTR1hSNcr" }, "source": [ "\n", "### Inference\n", "Let's run the model! You can change the prompts\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "apUdB40Ep6Ki" }, "outputs": [], "source": [ "input_text = \"Hey there my name is Elise, and I'm a speech generation model that can sound like a person.\"\n", "\n", "chosen_voice = None # None for single-speaker" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": { "iopub.execute_input": "2025-03-22T00:52:35.040842Z", "iopub.status.busy": "2025-03-22T00:52:35.040125Z", "iopub.status.idle": "2025-03-22T00:52:35.050560Z", "shell.execute_reply": "2025-03-22T00:52:35.049663Z", "shell.execute_reply.started": "2025-03-22T00:52:35.040818Z" }, "id": "krYI8PrRJ6MX" }, "outputs": [], "source": [ "#@title Run Inference\n", "\n", "import torch\n", "import re\n", "import numpy as np\n", "from typing import Dict, Any\n", "import torchaudio.transforms as T\n", "\n", "FastModel.for_inference(model) # Enable native 2x faster inference\n", "\n", "@torch.inference_mode()\n", "def generate_speech_from_text(\n", " text: str,\n", " temperature: float = 0.8, # Generation temperature\n", " top_k: int = 50, # Generation top_k\n", " top_p: float = 1, # Generation top_p\n", " max_new_audio_tokens: int = 2048, # Max tokens for audio part\n", " device: torch.device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n", ") -> np.ndarray:\n", " \"\"\"\n", " Generates speech audio from text using default voice control parameters.\n", "\n", " Args:\n", " text (str): The text input to be converted to speech.\n", " temperature (float): Sampling temperature for generation.\n", " top_k (int): Top-k sampling parameter.\n", " top_p (float): Top-p (nucleus) sampling parameter.\n", " max_new_audio_tokens (int): Max number of new tokens to generate (limits audio length).\n", " device (torch.device): Device to run inference on.\n", "\n", " Returns:\n", " np.ndarray: Generated waveform as a NumPy array.\n", " \"\"\"\n", "\n", " torch.compiler.reset()\n", "\n", " prompt = \"\".join([\n", " \"<|task_tts|>\",\n", " \"<|start_content|>\",\n", " text,\n", " \"<|end_content|>\",\n", " \"<|start_global_token|>\"\n", " ])\n", "\n", " model_inputs = tokenizer([prompt], return_tensors=\"pt\").to(device)\n", "\n", " print(\"Generating token sequence...\")\n", " generated_ids = model.generate(\n", " **model_inputs,\n", " max_new_tokens=max_new_audio_tokens, # Limit generation length\n", " do_sample=True,\n", " temperature=temperature,\n", " top_k=top_k,\n", " top_p=top_p,\n", " eos_token_id=tokenizer.eos_token_id, # Stop token\n", " pad_token_id=tokenizer.pad_token_id # Use models pad token id\n", " )\n", " print(\"Token sequence generated.\")\n", "\n", "\n", " generated_ids_trimmed = generated_ids[:, model_inputs.input_ids.shape[1]:]\n", "\n", "\n", " predicts_text = tokenizer.batch_decode(generated_ids_trimmed, skip_special_tokens=False)[0]\n", " # print(f\"\\nGenerated Text (for parsing):\\n{predicts_text}\\n\") # Debugging\n", "\n", " # Extract semantic token IDs using regex\n", " semantic_matches = re.findall(r\"<\\|bicodec_semantic_(\\d+)\\|>\", predicts_text)\n", " if not semantic_matches:\n", " print(\"Warning: No semantic tokens found in the generated output.\")\n", " # Handle appropriately - perhaps return silence or raise error\n", " return np.array([], dtype=np.float32)\n", "\n", " pred_semantic_ids = torch.tensor([int(token) for token in semantic_matches]).long().unsqueeze(0) # Add batch dim\n", "\n", " # Extract global token IDs using regex (assuming controllable mode also generates these)\n", " global_matches = re.findall(r\"<\\|bicodec_global_(\\d+)\\|>\", predicts_text)\n", " if not global_matches:\n", " print(\"Warning: No global tokens found in the generated output (controllable mode). Might use defaults or fail.\")\n", " pred_global_ids = torch.zeros((1, 1), dtype=torch.long)\n", " else:\n", " pred_global_ids = torch.tensor([int(token) for token in global_matches]).long().unsqueeze(0) # Add batch dim\n", "\n", " pred_global_ids = pred_global_ids.unsqueeze(0) # Shape becomes (1, 1, N_global)\n", "\n", " print(f\"Found {pred_semantic_ids.shape[1]} semantic tokens.\")\n", " print(f\"Found {pred_global_ids.shape[2]} global tokens.\")\n", "\n", "\n", " # 5. Detokenize using BiCodecTokenizer\n", " print(\"Detokenizing audio tokens...\")\n", " # Ensure audio_tokenizer and its internal model are on the correct device\n", " audio_tokenizer.device = device\n", " audio_tokenizer.model.to(device)\n", " # Squeeze the extra dimension from global tokens as seen in SparkTTS example\n", " wav_np = audio_tokenizer.detokenize(\n", " pred_global_ids.to(device).squeeze(0), # Shape (1, N_global)\n", " pred_semantic_ids.to(device) # Shape (1, N_semantic)\n", " )\n", " print(\"Detokenization complete.\")\n", "\n", " return wav_np\n", "\n", "if __name__ == \"__main__\":\n", " print(f\"Generating speech for: '{input_text}'\")\n", " text = f\"{chosen_voice}: \" + input_text if chosen_voice else input_text\n", " generated_waveform = generate_speech_from_text(input_text)\n", "\n", " if generated_waveform.size > 0:\n", " import soundfile as sf\n", " output_filename = \"generated_speech_controllable.wav\"\n", " sample_rate = audio_tokenizer.config.get(\"sample_rate\", 16000)\n", " sf.write(output_filename, generated_waveform, sample_rate)\n", " print(f\"Audio saved to {output_filename}\")\n", "\n", " # Optional: Play in notebook\n", " from IPython.display import Audio, display\n", " display(Audio(generated_waveform, rate=sample_rate))\n", " else:\n", " print(\"Audio generation failed (no tokens found?).\")" ] }, { "cell_type": "markdown", "metadata": { "id": "uMuVrWbjAzhc" }, "source": [ "\n", "### Saving, loading finetuned models\n", "To save the final model as LoRA adapters, either use Huggingface's `push_to_hub` for an online save or `save_pretrained` for a local save.\n", "\n", "**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "upcOlWe7A1vc" }, "outputs": [], "source": [ "model.save_pretrained(\"lora_model\") # Local saving\n", "tokenizer.save_pretrained(\"lora_model\")\n", "# model.push_to_hub(\"your_name/lora_model\", token = \"...\") # Online saving\n", "# tokenizer.push_to_hub(\"your_name/lora_model\", token = \"...\") # Online saving" ] }, { "cell_type": "markdown", "metadata": { "id": "f422JgM9sdVT" }, "source": [ "\n", "### Saving to float16\n", "\n", "We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "iHjt_SMYsd3P", "outputId": "bd8cccb7-6b95-45bf-80da-de120988447e" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Unsloth: You have 1 CPUs. Using `safe_serialization` is 10x slower.\n", "We shall switch to Pytorch saving, which might take 3 minutes and not 30 minutes.\n", "To force `safe_serialization`, set it to `None` instead.\n", "Unsloth: Kaggle/Colab has limited disk space. We need to delete the downloaded\n", "model which will save 4-16GB of disk space, allowing you to save on Kaggle/Colab.\n", "Unsloth: Will remove a cached repo with size 15.1G\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Unsloth: Merging 4bit and LoRA weights to 16bit...\n", "Unsloth: Will use up to 3.99 out of 12.67 RAM for saving.\n", "Unsloth: Saving model... This might take 5 minutes ...\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 28/28 [00:01<00:00, 27.83it/s]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Unsloth: Saving tokenizer... Done.\n", "Unsloth: Saving model/pytorch_model-00001-of-00002.bin...\n", "Unsloth: Saving model/pytorch_model-00002-of-00002.bin...\n", "Done.\n" ] } ], "source": [ "# Merge to 16bit\n", "if False: model.save_pretrained_merged(\"model\", tokenizer, save_method = \"merged_16bit\",)\n", "if False: model.push_to_hub_merged(\"hf/model\", tokenizer, save_method = \"merged_16bit\", token = \"\")\n", "\n", "# Merge to 4bit\n", "if False: model.save_pretrained_merged(\"model\", tokenizer, save_method = \"merged_4bit\",)\n", "if False: model.push_to_hub_merged(\"hf/model\", tokenizer, save_method = \"merged_4bit\", token = \"\")\n", "\n", "# Just LoRA adapters\n", "if False:\n", " model.save_pretrained(\"model\")\n", " tokenizer.save_pretrained(\"model\")\n", "if False:\n", " model.push_to_hub(\"hf/model\", token = \"\")\n", " tokenizer.push_to_hub(\"hf/model\", token = \"\")\n" ] }, { "cell_type": "markdown", "metadata": { "id": "egOSE7Cgynx7" }, "source": [ "And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!\n", "\n", "Some other links:\n", "1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb)\n", "2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)\n", "3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)\n", "6. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://docs.unsloth.ai/get-started/unsloth-notebooks)!\n", "\n", "
\n", " \n", " \n", " \n", "\n", " Join Discord if you need help + ⭐️ Star us on Github ⭐️\n", "
\n" ] } ], "metadata": { "accelerator": "GPU", "colab": { "gpuType": "T4", "provenance": [] }, "kaggle": { "accelerator": "nvidiaTeslaT4", "dataSources": [], "dockerImageVersionId": 30919, "isGpuEnabled": true, "isInternetEnabled": true, "language": "python", "sourceType": "notebook" }, "kernelspec": { "display_name": "TTS_ft", "language": "python", "name": "tts_ft" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.3" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "0474debc340943bd85f3daf92aebf7aa": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "FloatProgressModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "ProgressView", "bar_style": "", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_0de4d0f282404edfbc191dca73f15f35", "max": 401, "min": 0, "orientation": "horizontal", "style": "IPY_MODEL_e58b5ad2f781475d8af2ddb38009baa6", "value": 354 } }, "0de4d0f282404edfbc191dca73f15f35": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "2315228ff2b141afabe1263471f5364b": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HTMLView", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_426eb100a94642f79e6b99777406a265", "placeholder": "​", "style": "IPY_MODEL_a36b5cf197dd4bd9a7f70aa6671b804c", "value": "Map:  88%" } }, "33fbacbb2aa146cd90586357eec1dc3e": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "426eb100a94642f79e6b99777406a265": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "930b4d1d5f4b494b830df4d4c398e67c": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "DescriptionStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "description_width": "" } }, "a36b5cf197dd4bd9a7f70aa6671b804c": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "DescriptionStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "description_width": "" } }, "a3b0c0581f1f4c428baaadd8e9a39b6f": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HBoxModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HBoxView", "box_style": "", "children": [ "IPY_MODEL_2315228ff2b141afabe1263471f5364b", "IPY_MODEL_0474debc340943bd85f3daf92aebf7aa", "IPY_MODEL_cff1b0fa2ea24f45aab26685353eefdd" ], "layout": "IPY_MODEL_b7e20be79df246f19b35114a690e44f0" } }, "b7e20be79df246f19b35114a690e44f0": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "cff1b0fa2ea24f45aab26685353eefdd": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HTMLView", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_33fbacbb2aa146cd90586357eec1dc3e", "placeholder": "​", "style": "IPY_MODEL_930b4d1d5f4b494b830df4d4c398e67c", "value": " 354/401 [03:01<00:22,  2.11 examples/s]" } }, "e58b5ad2f781475d8af2ddb38009baa6": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "ProgressStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "bar_color": null, "description_width": "" } } } } }, "nbformat": 4, "nbformat_minor": 4 }