{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "43b502c1-9548-4580-84ad-1cbac158edb8",
   "metadata": {
    "id": "43b502c1-9548-4580-84ad-1cbac158edb8"
   },
   "source": [
    "# Bonus Unit 1: Fine-Tuning a model for Function-Calling\n",
    "\n",
    "In this tutorial, **we're going to Fine-Tune an LLM for Function Calling.**\n",
    "\n",
    "This notebook is part of the <a href=\"https://www.hf.co/learn/agents-course/unit1/introduction\">Hugging Face Agents Course</a>, a free Course from beginner to expert, where you learn to build Agents.\n",
    "\n",
    "<img src=\"https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/communication/share.png\" alt=\"Agent Course\"/>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "gWR4Rvpmjq5T",
   "metadata": {
    "id": "gWR4Rvpmjq5T"
   },
   "source": [
    "## Prerequisites 🏗️\n",
    "\n",
    "Before diving into the notebook, you need to:\n",
    "\n",
    "🔲 📚 **Study [What is Function-Calling](https://www.hf.co/learn/agents-course/bonus-unit1/what-is-function-calling) Section**\n",
    "\n",
    "🔲 📚 **Study [Fine-Tune your Model and what are LoRAs](https://www.hf.co/learn/agents-course/bonus-unit1/fine-tuning) Section**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1rZXU_1wkEPu",
   "metadata": {
    "id": "1rZXU_1wkEPu"
   },
   "source": [
    "# Step 0: Ask to Access Gemma on Hugging Face\n",
    "\n",
    "<img src=\"https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/bonus-unit1/gemma.png\" alt=\"Gemma\"/>\n",
    "\n",
    "\n",
    "To access Gemma on Hugging Face:\n",
    "\n",
    "1. **Make sure you're signed in** to your Hugging Face Account\n",
    "\n",
    "2. Go to https://huggingface.co/google/gemma-2-2b-it\n",
    "\n",
    "3. Click on **Acknowledge license** and fill the form.\n",
    "\n",
    "Alternatively you can use another model, and modify the code accordingly (it can be a good exercise for you to be sure you know how to fine-tune for Function-Calling).\n",
    "\n",
    "You can use for instance:\n",
    "\n",
    "- [HuggingFaceTB/SmolLM2-1.7B-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct)\n",
    "\n",
    "- [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5hjyx9nJlvKG",
   "metadata": {
    "id": "5hjyx9nJlvKG"
   },
   "source": [
    "## Step 1: Set the GPU 💪\n",
    "\n",
    "If you're on Colab:\n",
    "\n",
    "- To **accelerate the fine-tuning training, we'll use a GPU**. To do that, go to `Runtime > Change Runtime type`\n",
    "\n",
    "<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/gpu-step1.jpg\" alt=\"GPU Step 1\"/>\n",
    "\n",
    "- `Hardware Accelerator > GPU`\n",
    "\n",
    "<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/gpu-step2.jpg\" alt=\"GPU Step 2\"/>\n",
    "\n",
    "\n",
    "### Important\n",
    "\n",
    "For this Unit, **with the free-tier of Colab** it will take around **6h to train**.\n",
    "\n",
    "You have three solutions if you want to make it faster:\n",
    "\n",
    "1. Train on your computer if you have GPUs. It might take time but you have less risks of timeout.\n",
    "\n",
    "2. Use a Google Colab Pro that allows you use to A100 GPU (15-20min training).\n",
    "\n",
    "3. Just follow the code to learn how to do it without training."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5Thjsc9fj6Ej",
   "metadata": {
    "id": "5Thjsc9fj6Ej"
   },
   "source": [
    "## Step 2: Install dependencies 📚\n",
    "\n",
    "We need multiple librairies:\n",
    "\n",
    "- `bitsandbytes` for quantization\n",
    "- `peft`for LoRA adapters\n",
    "- `Transformers`for loading the model\n",
    "- `datasets`for loading and using the fine-tuning dataset\n",
    "- `trl`for the trainer class"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "e63f4962-c644-491e-aa91-50e453e953a4",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "e63f4962-c644-491e-aa91-50e453e953a4",
    "outputId": "3c1563d4-74fe-46f0-adcd-3e935261a89d",
    "tags": []
   },
   "outputs": [],
   "source": [
    "!pip install -q -U bitsandbytes\n",
    "!pip install -q -U transformers\n",
    "!pip install -q -U peft\n",
    "!pip install -q -U accelerate\n",
    "!pip install -q -U datasets\n",
    "!pip install -q trl==0.12.2\n",
    "!pip install -q -U tensorboardX\n",
    "!pip install -q wandb"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "UWNoZzi1urSZ",
   "metadata": {
    "id": "UWNoZzi1urSZ"
   },
   "source": [
    "## Step 3: Create your Hugging Face Token to push your model to the Hub\n",
    "\n",
    "To be able to share your model with the community there are some more steps to follow:\n",
    "\n",
    "1️⃣ (If it's not already done) create an account to HF ➡ https://huggingface.co/join\n",
    "\n",
    "2️⃣ Sign in and then, you need to store your authentication token from the Hugging Face website.\n",
    "\n",
    "- Create a new token (https://huggingface.co/settings/tokens) **with write role**\n",
    "\n",
    "<img src=\"https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/bonus-unit1/create_write_token.png\" alt=\"Create HF Token\" width=\"50%\">\n",
    "\n",
    "3️⃣ Store your token as an environment variable under the name \"HF_TOKEN\"\n",
    "- **Be very carefull not to share it with others** !"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "vBAkwg9zu6A1",
   "metadata": {
    "id": "vBAkwg9zu6A1"
   },
   "source": [
    "## Step 4: Import the librairies\n",
    "\n",
    "Don't forget to put your HF token."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "7ad2e4c2-593e-463e-9692-8d674c541d76",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 382
    },
    "id": "7ad2e4c2-593e-463e-9692-8d674c541d76",
    "outputId": "5004413d-0202-4031-85e3-d1897fb8eba5",
    "tags": []
   },
   "outputs": [],
   "source": [
    "from enum import Enum\n",
    "from functools import partial\n",
    "import pandas as pd\n",
    "import torch\n",
    "import json\n",
    "\n",
    "from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, BitsAndBytesConfig, set_seed\n",
    "from datasets import load_dataset\n",
    "from trl import SFTTrainer\n",
    "from peft import get_peft_model, LoraConfig, TaskType\n",
    "\n",
    "seed = 42\n",
    "set_seed(seed)\n",
    "\n",
    "import os\n",
    "\n",
    "# Put your HF Token here\n",
    "os.environ['HF_TOKEN']=\"hf_xxxx\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "44f30b2c-2cc0-48e0-91ca-4633e6444105",
   "metadata": {
    "id": "44f30b2c-2cc0-48e0-91ca-4633e6444105"
   },
   "source": [
    "## Step 5: Processing the dataset into inputs\n",
    "\n",
    "In order to train the model, we need to **format the inputs into what we want the model to learn**.\n",
    "\n",
    "For this tutorial, I enhanced a popular dataset for function calling \"NousResearch/hermes-function-calling-v1\" by adding some new **thinking** step computer from **deepseek-ai/DeepSeek-R1-Distill-Qwen-32B**.\n",
    "\n",
    "But in order for the model to learn, we need **to format the conversation correctly**. If you followed Unit 1, you know that going from a list of messages to a prompt is handled by the **chat_template**, or, the default chat_template of gemma-2-2B does not contain tool calls. So we will need to modify it !\n",
    "\n",
    "This is the role of our **preprocess** function. To go from a list of messages, to a prompt that the model can understand.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "29da85c8-33bf-4864-aed7-733cbe703512",
   "metadata": {
    "id": "29da85c8-33bf-4864-aed7-733cbe703512",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "2196bea18c254339847558f804db8853",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "tokenizer_config.json:   0%|          | 0.00/47.0k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "144ed3e99ca8400dad935c2aad2d0816",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "ef4443985f3f44238b88775168b3eb8f",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "f6f74184ebe44a3dbf575530c3e007f4",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "732a30a290144a65887d030eb90e562e",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "README.md:   0%|          | 0.00/354 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "2467d02301a849ce8cb0d06602652a77",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "train-00000-of-00001.parquet:   0%|          | 0.00/3.85M [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "cee93ea2b48945ae82092c33b50e7ca4",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Generating train split:   0%|          | 0/3570 [00:00<?, ? examples/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "model_name = \"google/gemma-2-2b-it\"\n",
    "dataset_name = \"Jofthomas/hermes-function-calling-thinking-V1\"\n",
    "tokenizer = AutoTokenizer.from_pretrained(model_name)\n",
    "\n",
    "tokenizer.chat_template = \"{{ bos_token }}{% if messages[0]['role'] == 'system' %}{{ raise_exception('System role not supported') }}{% endif %}{% for message in messages %}{{ '<start_of_turn>' + message['role'] + '\\n' + message['content'] | trim + '<end_of_turn><eos>\\n' }}{% endfor %}{% if add_generation_prompt %}{{'<start_of_turn>model\\n'}}{% endif %}\"\n",
    "\n",
    "\n",
    "def preprocess(samples):\n",
    "    batch = []\n",
    "    for conversations in zip(samples[\"conversations\"]):\n",
    "        conversation = conversations[0]\n",
    "\n",
    "        # Instead of adding a system message, we merge the content into the first user message\n",
    "        if conversation[0][\"role\"] == \"system\":\n",
    "            system_message_content = conversation[0][\"content\"]\n",
    "            # Merge system content with the first user message\n",
    "            conversation[1][\"content\"] = system_message_content + \"Also, before making a call to a function take the time to plan the function to take. Make that thinking process between <think>{your thoughts}</think>\\n\\n\" + conversation[1][\"content\"]\n",
    "            # Remove the system message from the conversation\n",
    "            conversation.pop(0)\n",
    "\n",
    "        batch.append(tokenizer.apply_chat_template(conversation, tokenize=False))\n",
    "\n",
    "    return {\"content\": batch}\n",
    "\n",
    "dataset = load_dataset(dataset_name)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dc8736d5-d64b-4c5c-9738-be08421d3f95",
   "metadata": {
    "id": "dc8736d5-d64b-4c5c-9738-be08421d3f95"
   },
   "source": [
    "## Step 6: A Dedicated Dataset for This Unit\n",
    "\n",
    "For this Bonus Unit, we created a custom dataset based on [NousResearch/hermes-function-calling-v1](https://huggingface.co/datasets/NousResearch/hermes-function-calling-v1), which is considered a **reference** when it comes to function-calling datasets.\n",
    "\n",
    "While the original dataset is excellent, it does **not** include a **“thinking”** step.\n",
    "\n",
    "In Function-Calling, such a step is optional, but recent work—like the **deepseek** model or the paper [\"Test-Time Compute\"](https://huggingface.co/papers/2408.03314)—suggests that giving an LLM time to “think” before it answers (or in this case, **before** taking an action) can **significantly improve** model performance.\n",
    "\n",
    "I, decided to then compute a subset of this dataset and to give it to [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) in order to compute some thinking tokens `<think>` before any function call. Which resulted in the following dataset :\n",
    "![Input Dataset](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/bonus-unit1/dataset_function_call.png)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "b63d4832-d92e-482d-9fe6-6e9dbfee377a",
   "metadata": {
    "id": "b63d4832-d92e-482d-9fe6-6e9dbfee377a",
    "outputId": "bda88f48-ca5d-4f47-b887-48d4ea5a53aa",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "343ae0d35c79439788601f8cb4e734d1",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Map:   0%|          | 0/3570 [00:00<?, ? examples/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "DatasetDict({\n",
      "    train: Dataset({\n",
      "        features: ['content'],\n",
      "        num_rows: 3213\n",
      "    })\n",
      "    test: Dataset({\n",
      "        features: ['content'],\n",
      "        num_rows: 357\n",
      "    })\n",
      "})\n"
     ]
    }
   ],
   "source": [
    "dataset = dataset.map(\n",
    "    preprocess,\n",
    "    batched=True,\n",
    "    remove_columns=dataset[\"train\"].column_names\n",
    ")\n",
    "dataset = dataset[\"train\"].train_test_split(0.1)\n",
    "print(dataset)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "67724a23-f298-4247-b002-2cf370b03897",
   "metadata": {
    "id": "67724a23-f298-4247-b002-2cf370b03897"
   },
   "source": [
    "## Step 7: Checking the inputs\n",
    "\n",
    "Let's manually look at what an input looks like !\n",
    "\n",
    "In this example we have :\n",
    "\n",
    "1. A *User message* containing the **necessary information with the list of available tools** inbetween `<tools></tools>` then the user query, here:  `\"Can you get me the latest news headlines for the United States?\"`\n",
    "\n",
    "2. An *Assistant message* here called \"model\" to fit the criterias from gemma models containing two new phases, a **\"thinking\"** phase contained in `<think></think>` and an **\"Act\"** phase contained in `<tool_call></<tool_call>`.\n",
    "\n",
    "3. If the model contains a `<tools_call>`, we will append the result of this action in a new **\"Tool\"** message containing a `<tool_response></tool_response>` with the answer from the tool."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "dc60da04-9411-487a-b629-2c59024a20c0",
   "metadata": {
    "id": "dc60da04-9411-487a-b629-2c59024a20c0",
    "outputId": "6709e478-17b8-4769-865f-2cd025727ad4",
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<bos><start_of_turn>human\n",
      "You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags.You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.Here are the available tools:<tools> [{'type': 'function', 'function': {'name': 'get_news_headlines', 'description': 'Get the latest news headlines', 'parameters': {'type': 'object', 'properties': {'country': {'type': 'string', 'description': 'The country for which headlines are needed'}}, 'required': ['country']}}}, {'type': 'function', 'function': {'name': 'search_recipes', 'description': 'Search for recipes based on ingredients', 'parameters': {'type': 'object', 'properties': {'ingredients': {'type': 'array', 'items': {'type': 'string'}, 'description': 'The list of ingredients'}}, 'required': ['ingredients']}}}] </tools>Use the following pydantic model json schema for each tool call you will make: {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']}For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:\n",
      "<tool_call>\n",
      "{tool_call}\n",
      "</tool_call>Also, before making a call to a function take the time to plan the function to take. Make that thinking process between <think>{your thoughts}</think>\n",
      "\n",
      "Can you get me the latest news headlines for the United States?<end_of_turn><eos>\n",
      "<start_of_turn>model\n",
      "<think>Okay, so the user asked for the latest news headlines in the United States. I need to figure out how to respond appropriately using the tools available.\n",
      "\n",
      "Looking at the tools provided, there's a function called get_news_headlines. That seems perfect for this situation. It requires a country code, which the user specified as the United States.\n",
      "\n",
      "Since the user is asking specifically about the US, I'll use the country parameter as 'United States' in the function call. This should fetch the relevant news for them.\n",
      "\n",
      "I should make sure to format the response correctly as a JSON object with the function name and arguments.\n",
      "</think><tool_call>\n",
      "{'name': 'get_news_headlines', 'arguments': {'country': 'United States'}}\n",
      "</tool_call><end_of_turn><eos>\n",
      "<start_of_turn>tool\n",
      "<tool_response>\n",
      "{'headlines': ['US President signs new bill into law', 'Major earthquake hits California', 'NASA announces new Mars mission', 'Tech giant releases new smartphone', 'US economy shows signs of recovery']}\n",
      "</tool_response><end_of_turn><eos>\n",
      "<start_of_turn>model\n",
      "Here are the latest news headlines for the United States:\n",
      "1. US President signs new bill into law\n",
      "2. Major earthquake hits California\n",
      "3. NASA announces new Mars mission\n",
      "4. Tech giant releases new smartphone\n",
      "5. US economy shows signs of recovery<end_of_turn><eos>\n",
      "<start_of_turn>human\n",
      "That's interesting. Can you get me the news for France now?<end_of_turn><eos>\n",
      "<start_of_turn>model\n",
      "<think>Okay, the user initially asked for the latest US news, so I used get_news_headlines with 'United States'. Now the user is asking for France, so I'll call the same function with 'France' as the country. That makes sense because the function is designed for any country.\n",
      "</think><tool_call>\n",
      "{'name': 'get_news_headlines', 'arguments': {'country': 'France'}}\n",
      "</tool_call><end_of_turn><eos>\n",
      "<start_of_turn>tool\n",
      "<tool_response>\n",
      "{'headlines': ['French President announces new environmental policy', 'Paris Fashion Week highlights', 'France wins World Cup qualifier', 'New culinary trend sweeps across France', 'French tech startup raises millions in funding']}\n",
      "</tool_response><end_of_turn><eos>\n",
      "<start_of_turn>model\n",
      "Here are the latest news headlines for France:\n",
      "1. French President announces new environmental policy\n",
      "2. Paris Fashion Week highlights\n",
      "3. France wins World Cup qualifier\n",
      "4. New culinary trend sweeps across France\n",
      "5. French tech startup raises millions in funding<end_of_turn><eos>\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# Let's look at how we formatted the dataset\n",
    "print(dataset[\"train\"][8][\"content\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "53a48281-2346-4dfb-ad60-cad85129ec9b",
   "metadata": {
    "id": "53a48281-2346-4dfb-ad60-cad85129ec9b",
    "outputId": "da4f7e33-227c-48bc-b3c3-24df31313a69",
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<pad>\n",
      "<eos>\n"
     ]
    }
   ],
   "source": [
    "# Sanity check\n",
    "print(tokenizer.pad_token)\n",
    "print(tokenizer.eos_token)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d6864b36-6033-445a-b6e2-b6bb02e38e26",
   "metadata": {
    "id": "d6864b36-6033-445a-b6e2-b6bb02e38e26"
   },
   "source": [
    "## Step 8: Let's Modify the Tokenizer\n",
    "\n",
    "Indeed, as we saw in Unit 1, the tokenizer splits text into sub-words by default. This is **not** what we want for our new special tokens!\n",
    "\n",
    "While we segmented our example using `<think>`, `<tool_call>`, and `<tool_response>`, the tokenizer does **not** yet treat them as whole tokens—it still tries to break them down into smaller pieces. To ensure the model correctly interprets our new format, we must **add these tokens** to our tokenizer.\n",
    "\n",
    "Additionally, since we changed the `chat_template` in our **preprocess** function to format conversations as messages within a prompt, we also need to modify the `chat_template` in the tokenizer to reflect these changes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "833ba5d6-4c1e-4689-9fed-22cc03d2a63a",
   "metadata": {
    "colab": {
     "referenced_widgets": [
      "fc7eee07d5824955adb7b9bbd025c297"
     ]
    },
    "id": "833ba5d6-4c1e-4689-9fed-22cc03d2a63a",
    "outputId": "2ef794e9-3e8e-4dec-d31c-b9242386c2d0",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "c93adf223ba642ff83433bd0a49b0729",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "config.json:   0%|          | 0.00/838 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "fdcee6dad0714e3fb7c71e7964c3b277",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "model.safetensors.index.json:   0%|          | 0.00/24.2k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "f47c44a05d7e4d57b320571d39945e46",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "ab17cbfdf3864dc18c8ae5fed06b8b90",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "model-00001-of-00002.safetensors:   0%|          | 0.00/4.99G [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "7b4ba042da7641a1a9838d66711a0a4c",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "model-00002-of-00002.safetensors:   0%|          | 0.00/241M [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "45020ba214804c9b8e33e875508b860b",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "1b0a5cf11a9c4794b9de47ae54192b68",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "generation_config.json:   0%|          | 0.00/187 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "Gemma2ForCausalLM(\n",
       "  (model): Gemma2Model(\n",
       "    (embed_tokens): Embedding(256008, 2304, padding_idx=0)\n",
       "    (layers): ModuleList(\n",
       "      (0-25): 26 x Gemma2DecoderLayer(\n",
       "        (self_attn): Gemma2Attention(\n",
       "          (q_proj): Linear(in_features=2304, out_features=2048, bias=False)\n",
       "          (k_proj): Linear(in_features=2304, out_features=1024, bias=False)\n",
       "          (v_proj): Linear(in_features=2304, out_features=1024, bias=False)\n",
       "          (o_proj): Linear(in_features=2048, out_features=2304, bias=False)\n",
       "          (rotary_emb): Gemma2RotaryEmbedding()\n",
       "        )\n",
       "        (mlp): Gemma2MLP(\n",
       "          (gate_proj): Linear(in_features=2304, out_features=9216, bias=False)\n",
       "          (up_proj): Linear(in_features=2304, out_features=9216, bias=False)\n",
       "          (down_proj): Linear(in_features=9216, out_features=2304, bias=False)\n",
       "          (act_fn): PytorchGELUTanh()\n",
       "        )\n",
       "        (input_layernorm): Gemma2RMSNorm((2304,), eps=1e-06)\n",
       "        (pre_feedforward_layernorm): Gemma2RMSNorm((2304,), eps=1e-06)\n",
       "        (post_feedforward_layernorm): Gemma2RMSNorm((2304,), eps=1e-06)\n",
       "        (post_attention_layernorm): Gemma2RMSNorm((2304,), eps=1e-06)\n",
       "      )\n",
       "    )\n",
       "    (norm): Gemma2RMSNorm((2304,), eps=1e-06)\n",
       "  )\n",
       "  (lm_head): Linear(in_features=2304, out_features=256008, bias=False)\n",
       ")"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "class ChatmlSpecialTokens(str, Enum):\n",
    "    tools = \"<tools>\"\n",
    "    eotools = \"</tools>\"\n",
    "    think = \"<think>\"\n",
    "    eothink = \"</think>\"\n",
    "    tool_call=\"<tool_call>\"\n",
    "    eotool_call=\"</tool_call>\"\n",
    "    tool_response=\"<tool_reponse>\"\n",
    "    eotool_response=\"</tool_reponse>\"\n",
    "    pad_token = \"<pad>\"\n",
    "    eos_token = \"<eos>\"\n",
    "    @classmethod\n",
    "    def list(cls):\n",
    "        return [c.value for c in cls]\n",
    "\n",
    "tokenizer = AutoTokenizer.from_pretrained(\n",
    "        model_name,\n",
    "        pad_token=ChatmlSpecialTokens.pad_token.value,\n",
    "        additional_special_tokens=ChatmlSpecialTokens.list()\n",
    "    )\n",
    "tokenizer.chat_template = \"{{ bos_token }}{% if messages[0]['role'] == 'system' %}{{ raise_exception('System role not supported') }}{% endif %}{% for message in messages %}{{ '<start_of_turn>' + message['role'] + '\\n' + message['content'] | trim + '<end_of_turn><eos>\\n' }}{% endfor %}{% if add_generation_prompt %}{{'<start_of_turn>model\\n'}}{% endif %}\"\n",
    "\n",
    "model = AutoModelForCausalLM.from_pretrained(model_name,\n",
    "                                              attn_implementation='eager',\n",
    "                                             device_map=\"auto\")\n",
    "model.resize_token_embeddings(len(tokenizer))\n",
    "model.to(torch.bfloat16)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "X6DBY8AqxFLL",
   "metadata": {
    "id": "X6DBY8AqxFLL"
   },
   "source": [
    "## Step 9: Let's configure the LoRA\n",
    "\n",
    "This is we are going to define the parameter of our adapter. Those a the most important parameters in LoRA as they define the size and importance of the adapters we are training."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "482d36ab-e326-4fd7-bc59-425abcca55e7",
   "metadata": {
    "id": "482d36ab-e326-4fd7-bc59-425abcca55e7",
    "tags": []
   },
   "outputs": [],
   "source": [
    "from peft import LoraConfig\n",
    "\n",
    "# TODO: Configure LoRA parameters\n",
    "# r: rank dimension for LoRA update matrices (smaller = more compression)\n",
    "rank_dimension = 16\n",
    "# lora_alpha: scaling factor for LoRA layers (higher = stronger adaptation)\n",
    "lora_alpha = 64\n",
    "# lora_dropout: dropout probability for LoRA layers (helps prevent overfitting)\n",
    "lora_dropout = 0.05\n",
    "\n",
    "peft_config = LoraConfig(r=rank_dimension,\n",
    "                         lora_alpha=lora_alpha,\n",
    "                         lora_dropout=lora_dropout,\n",
    "                         target_modules=[\"gate_proj\",\"q_proj\",\"lm_head\",\"o_proj\",\"k_proj\",\"embed_tokens\",\"down_proj\",\"up_proj\",\"v_proj\"], # wich layer in the transformers do we target ?\n",
    "                         task_type=TaskType.CAUSAL_LM)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "zdDR9hzgxPu2",
   "metadata": {
    "id": "zdDR9hzgxPu2"
   },
   "source": [
    "## Step 10: Let's define the Trainer and the Fine-Tuning hyperparameters\n",
    "\n",
    "In this step, we define the Trainer, the class that we use to fine-tune our model and the hyperparameters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3598b688-5a6f-437f-95ac-4794688cd38f",
   "metadata": {
    "id": "3598b688-5a6f-437f-95ac-4794688cd38f",
    "outputId": "515f019f-87b6-40cb-9344-f4c19645e077",
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/user/miniconda/lib/python3.9/site-packages/transformers/training_args.py:1568: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead\n",
      "  warnings.warn(\n"
     ]
    }
   ],
   "source": [
    "output_dir = \"gemma-2-2B-it-thinking-function_calling\" # The directory where the trained model checkpoints, logs, and other artifacts will be saved. It will also be the default name of the model when pushed to the hub if not redefined later.\n",
    "per_device_train_batch_size = 1\n",
    "per_device_eval_batch_size = 1\n",
    "gradient_accumulation_steps = 4\n",
    "logging_steps = 5\n",
    "learning_rate = 1e-4 # The initial learning rate for the optimizer.\n",
    "\n",
    "max_grad_norm = 1.0\n",
    "num_train_epochs=1\n",
    "warmup_ratio = 0.1\n",
    "lr_scheduler_type = \"cosine\"\n",
    "max_seq_length = 2048\n",
    "\n",
    "training_arguments = TrainingArguments(\n",
    "    output_dir=output_dir,\n",
    "    per_device_train_batch_size=per_device_train_batch_size,\n",
    "    per_device_eval_batch_size=per_device_eval_batch_size,\n",
    "    gradient_accumulation_steps=gradient_accumulation_steps,\n",
    "    save_strategy=\"no\",\n",
    "    evaluation_strategy=\"epoch\",\n",
    "    logging_steps=logging_steps,\n",
    "    learning_rate=learning_rate,\n",
    "    max_grad_norm=max_grad_norm,\n",
    "    weight_decay=0.1,\n",
    "    warmup_ratio=warmup_ratio,\n",
    "    lr_scheduler_type=lr_scheduler_type,\n",
    "    report_to=\"tensorboard\",\n",
    "    bf16=True,\n",
    "    hub_private_repo=False,\n",
    "    push_to_hub=False,\n",
    "    num_train_epochs=num_train_epochs,\n",
    "    gradient_checkpointing=True,\n",
    "    gradient_checkpointing_kwargs={\"use_reentrant\": False}\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "59TTqmW2xmV2",
   "metadata": {
    "id": "59TTqmW2xmV2"
   },
   "source": [
    "As Trainer, we use the `SFTTrainer` which is a Supervised Fine-Tuning Trainer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ba0366b5-c9d0-4f7e-97e0-1f964cfad147",
   "metadata": {
    "id": "ba0366b5-c9d0-4f7e-97e0-1f964cfad147",
    "outputId": "8b2836b3-3a06-4c05-b046-6c7923911e40",
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/user/miniconda/lib/python3.9/site-packages/huggingface_hub/utils/_deprecation.py:100: FutureWarning: Deprecated argument(s) used in '__init__': packing, dataset_text_field, max_seq_length, dataset_kwargs. Will not be supported from version '0.13.0'.\n",
      "\n",
      "Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.\n",
      "  warnings.warn(message, FutureWarning)\n",
      "/home/user/miniconda/lib/python3.9/site-packages/transformers/training_args.py:1568: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead\n",
      "  warnings.warn(\n",
      "/home/user/miniconda/lib/python3.9/site-packages/trl/trainer/sft_trainer.py:212: UserWarning: You passed a `packing` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.\n",
      "  warnings.warn(\n",
      "/home/user/miniconda/lib/python3.9/site-packages/peft/tuners/tuners_utils.py:543: UserWarning: Model with `tie_word_embeddings=True` and the tied_target_modules=['lm_head'] are part of the adapter. This can lead to complications, for example when merging the adapter or converting your model to formats other than safetensors. See for example https://github.com/huggingface/peft/issues/2018.\n",
      "  warnings.warn(\n",
      "/home/user/miniconda/lib/python3.9/site-packages/trl/trainer/sft_trainer.py:300: UserWarning: You passed a `max_seq_length` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.\n",
      "  warnings.warn(\n",
      "/home/user/miniconda/lib/python3.9/site-packages/trl/trainer/sft_trainer.py:328: UserWarning: You passed a `dataset_text_field` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.\n",
      "  warnings.warn(\n",
      "/home/user/miniconda/lib/python3.9/site-packages/trl/trainer/sft_trainer.py:334: UserWarning: You passed a `dataset_kwargs` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.\n",
      "  warnings.warn(\n",
      "/home/user/miniconda/lib/python3.9/site-packages/trl/trainer/sft_trainer.py:403: UserWarning: You passed a processing_class with `padding_side` not equal to `right` to the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when training a model in half-precision. You might consider adding `processing_class.padding_side = 'right'` to your code.\n",
      "  warnings.warn(\n"
     ]
    }
   ],
   "source": [
    "trainer = SFTTrainer(\n",
    "    model=model,\n",
    "    args=training_arguments,\n",
    "    train_dataset=dataset[\"train\"],\n",
    "    eval_dataset=dataset[\"test\"],\n",
    "    tokenizer=tokenizer,\n",
    "    packing=True,\n",
    "    dataset_text_field=\"content\",\n",
    "    max_seq_length=max_seq_length,\n",
    "    peft_config=peft_config,\n",
    "    dataset_kwargs={\n",
    "        \"append_concat_token\": False,\n",
    "        \"add_special_tokens\": False,\n",
    "    },\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "MtHjukK9xviB",
   "metadata": {
    "id": "MtHjukK9xviB"
   },
   "source": [
    "Here, we launch the training 🔥. Perfect time for you to pause and grab a coffee ☕."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9e2df2e9-a82b-4540-aa89-1b40b70a7781",
   "metadata": {
    "id": "9e2df2e9-a82b-4540-aa89-1b40b70a7781",
    "outputId": "8ad7e555-678b-4904-aa6b-000b619c9341",
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "\n",
       "    <div>\n",
       "      \n",
       "      <progress value='389' max='389' style='width:300px; height:20px; vertical-align: middle;'></progress>\n",
       "      [389/389 14:14, Epoch 0/1]\n",
       "    </div>\n",
       "    <table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       " <tr style=\"text-align: left;\">\n",
       "      <th>Epoch</th>\n",
       "      <th>Training Loss</th>\n",
       "      <th>Validation Loss</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>0.294600</td>\n",
       "      <td>0.289091</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table><p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/user/miniconda/lib/python3.9/site-packages/peft/utils/save_and_load.py:230: UserWarning: Setting `save_embedding_layers` to `True` as embedding layers found in `target_modules`.\n",
      "  warnings.warn(\"Setting `save_embedding_layers` to `True` as embedding layers found in `target_modules`.\")\n"
     ]
    }
   ],
   "source": [
    "trainer.train()\n",
    "trainer.save_model()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1d7ea3ab-7c8c-47ad-acd2-99fbe5b68393",
   "metadata": {
    "id": "1d7ea3ab-7c8c-47ad-acd2-99fbe5b68393",
    "tags": []
   },
   "source": [
    "## Step 11: Let's push the Model and the Tokenizer to the Hub\n",
    "\n",
    "Let's push our model and out tokenizer to the Hub ! The model will be pushed under your username + the output_dir that we specified earlier."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "370af020-9319-4ff7-bea1-2842a4847caa",
   "metadata": {
    "colab": {
     "referenced_widgets": [
      "68b34e3d2eae4f24b83fa65cf5815738",
      "15e7c632053c4ed88267061a8112d641",
      "047ebf7fda8643c090129bc2b86a7e3e",
      "ef5dab829e8b491581f6dae2b7718113",
      "97ba6384a2f94db0880a17ff433a8ed9",
      "ba0c0c53b23047ac9336fdbf8597f32f",
      "8a033d5bd32b4a969fd5af612c550243",
      "714d4a40a67a4763ba8f7ae029befbd7",
      "52451f392b434fb39b882c9f094bd995",
      "08c2ca8a719e4142bd877ae94d242f2e",
      "02e660bd44a7414fae439751e9ffa1f2"
     ]
    },
    "id": "370af020-9319-4ff7-bea1-2842a4847caa",
    "outputId": "f5797c01-306d-48ee-a009-3c115f5b1ca5",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "68b34e3d2eae4f24b83fa65cf5815738",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "adapter_model.safetensors:   0%|          | 0.00/2.48G [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "15e7c632053c4ed88267061a8112d641",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "events.out.tfevents.1739725934.r-jofthomas-fttest-0ihwmg95-70a55-shjb6:   0%|          | 0.00/21.5k [00:00<?, …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "047ebf7fda8643c090129bc2b86a7e3e",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "events.out.tfevents.1739728410.r-jofthomas-fttest-0ihwmg95-70a55-shjb6:   0%|          | 0.00/22.1k [00:00<?, …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "ef5dab829e8b491581f6dae2b7718113",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "events.out.tfevents.1739724308.r-jofthomas-fttest-0ihwmg95-70a55-shjb6:   0%|          | 0.00/21.5k [00:00<?, …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "97ba6384a2f94db0880a17ff433a8ed9",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Upload 10 LFS files:   0%|          | 0/10 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "ba0c0c53b23047ac9336fdbf8597f32f",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "events.out.tfevents.1739727155.r-jofthomas-fttest-0ihwmg95-70a55-shjb6:   0%|          | 0.00/22.1k [00:00<?, …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "8a033d5bd32b4a969fd5af612c550243",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "events.out.tfevents.1739809832.r-jofthomas-fttest-x95brwd3-5f8b4-w4aox:   0%|          | 0.00/22.5k [00:00<?, …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "714d4a40a67a4763ba8f7ae029befbd7",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "events.out.tfevents.1739860009.r-jofthomas-fttest-8up4ewpe-95503-rafjw:   0%|          | 0.00/22.5k [00:00<?, …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "52451f392b434fb39b882c9f094bd995",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "events.out.tfevents.1739862234.r-jofthomas-fttest-8up4ewpe-95503-rafjw:   0%|          | 0.00/22.5k [00:00<?, …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "08c2ca8a719e4142bd877ae94d242f2e",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "tokenizer.json:   0%|          | 0.00/34.4M [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "02e660bd44a7414fae439751e9ffa1f2",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "training_args.bin:   0%|          | 0.00/5.69k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "CommitInfo(commit_url='https://huggingface.co/Jofthomas/gemma-2-2B-it-thinking-function_calling/commit/1fd13c76657670ca45620b6893e4fbfda0207a91', commit_message='End of training', commit_description='', oid='1fd13c76657670ca45620b6893e4fbfda0207a91', pr_url=None, repo_url=RepoUrl('https://huggingface.co/Jofthomas/gemma-2-2B-it-thinking-function_calling', endpoint='https://huggingface.co', repo_type='model', repo_id='Jofthomas/gemma-2-2B-it-thinking-function_calling'), pr_revision=None, pr_num=None)"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "trainer.push_to_hub()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "83a443ce-5072-4777-8621-cd4faf840410",
   "metadata": {
    "id": "83a443ce-5072-4777-8621-cd4faf840410"
   },
   "source": [
    "Since we also modified the **chat_template** Which is contained in the tokenizer, let's also push the tokenizer with the model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9d9a86b3-f23d-4060-a97f-b868a7c38c36",
   "metadata": {
    "id": "9d9a86b3-f23d-4060-a97f-b868a7c38c36",
    "outputId": "2726291c-5720-473e-ed92-e4f425f82bae",
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "No files have been modified since last commit. Skipping to prevent empty commit.\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "CommitInfo(commit_url='https://huggingface.co/Jofthomas/gemma-2-2B-it-thinking-function_calling/commit/50ea3ee78ed458c6d773f53b326531becdda0211', commit_message='Upload tokenizer', commit_description='', oid='50ea3ee78ed458c6d773f53b326531becdda0211', pr_url=None, repo_url=RepoUrl('https://huggingface.co/Jofthomas/gemma-2-2B-it-thinking-function_calling', endpoint='https://huggingface.co', repo_type='model', repo_id='Jofthomas/gemma-2-2B-it-thinking-function_calling'), pr_revision=None, pr_num=None)"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tokenizer.eos_token = \"<eos>\"\n",
    "# push the tokenizer to hub ( replace with your username and your previously specified\n",
    "tokenizer.push_to_hub(f\"username/{output_dir}\", token=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "76d275ce-a3e6-4d30-8d8c-0ee274de5370",
   "metadata": {
    "id": "76d275ce-a3e6-4d30-8d8c-0ee274de5370"
   },
   "source": [
    "## Step 12: Let's now test our model !\n",
    "\n",
    "To so, we will :\n",
    "\n",
    "1. Load the adapter from the hub !\n",
    "2. Load the base model : **\"google/gemma-2-2b-it\"** from the hub\n",
    "3."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "56b89825-70ac-42c1-934c-26e2d54f3b7b",
   "metadata": {
    "colab": {
     "referenced_widgets": [
      "390c54434b6448b988ce015eeafe34c9",
      "35b2fe2d357b46488ccef710f2a9bfd7",
      "9c313149d4324bdaa9c8ddc373964d18"
     ]
    },
    "id": "56b89825-70ac-42c1-934c-26e2d54f3b7b",
    "outputId": "a4cd00b8-61fa-4522-d563-c4ef7e18807d",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "390c54434b6448b988ce015eeafe34c9",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "adapter_config.json:   0%|          | 0.00/829 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "35b2fe2d357b46488ccef710f2a9bfd7",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "9c313149d4324bdaa9c8ddc373964d18",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "adapter_model.safetensors:   0%|          | 0.00/2.48G [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "PeftModelForCausalLM(\n",
       "  (base_model): LoraModel(\n",
       "    (model): Gemma2ForCausalLM(\n",
       "      (model): Gemma2Model(\n",
       "        (embed_tokens): lora.Embedding(\n",
       "          (base_layer): Embedding(256006, 2304, padding_idx=0)\n",
       "          (lora_dropout): ModuleDict(\n",
       "            (default): Dropout(p=0.05, inplace=False)\n",
       "          )\n",
       "          (lora_A): ModuleDict()\n",
       "          (lora_B): ModuleDict()\n",
       "          (lora_embedding_A): ParameterDict(  (default): Parameter containing: [torch.cuda.BFloat16Tensor of size 16x256006 (cuda:0)])\n",
       "          (lora_embedding_B): ParameterDict(  (default): Parameter containing: [torch.cuda.BFloat16Tensor of size 2304x16 (cuda:0)])\n",
       "          (lora_magnitude_vector): ModuleDict()\n",
       "        )\n",
       "        (layers): ModuleList(\n",
       "          (0-25): 26 x Gemma2DecoderLayer(\n",
       "            (self_attn): Gemma2Attention(\n",
       "              (q_proj): lora.Linear(\n",
       "                (base_layer): Linear(in_features=2304, out_features=2048, bias=False)\n",
       "                (lora_dropout): ModuleDict(\n",
       "                  (default): Dropout(p=0.05, inplace=False)\n",
       "                )\n",
       "                (lora_A): ModuleDict(\n",
       "                  (default): Linear(in_features=2304, out_features=16, bias=False)\n",
       "                )\n",
       "                (lora_B): ModuleDict(\n",
       "                  (default): Linear(in_features=16, out_features=2048, bias=False)\n",
       "                )\n",
       "                (lora_embedding_A): ParameterDict()\n",
       "                (lora_embedding_B): ParameterDict()\n",
       "                (lora_magnitude_vector): ModuleDict()\n",
       "              )\n",
       "              (k_proj): lora.Linear(\n",
       "                (base_layer): Linear(in_features=2304, out_features=1024, bias=False)\n",
       "                (lora_dropout): ModuleDict(\n",
       "                  (default): Dropout(p=0.05, inplace=False)\n",
       "                )\n",
       "                (lora_A): ModuleDict(\n",
       "                  (default): Linear(in_features=2304, out_features=16, bias=False)\n",
       "                )\n",
       "                (lora_B): ModuleDict(\n",
       "                  (default): Linear(in_features=16, out_features=1024, bias=False)\n",
       "                )\n",
       "                (lora_embedding_A): ParameterDict()\n",
       "                (lora_embedding_B): ParameterDict()\n",
       "                (lora_magnitude_vector): ModuleDict()\n",
       "              )\n",
       "              (v_proj): lora.Linear(\n",
       "                (base_layer): Linear(in_features=2304, out_features=1024, bias=False)\n",
       "                (lora_dropout): ModuleDict(\n",
       "                  (default): Dropout(p=0.05, inplace=False)\n",
       "                )\n",
       "                (lora_A): ModuleDict(\n",
       "                  (default): Linear(in_features=2304, out_features=16, bias=False)\n",
       "                )\n",
       "                (lora_B): ModuleDict(\n",
       "                  (default): Linear(in_features=16, out_features=1024, bias=False)\n",
       "                )\n",
       "                (lora_embedding_A): ParameterDict()\n",
       "                (lora_embedding_B): ParameterDict()\n",
       "                (lora_magnitude_vector): ModuleDict()\n",
       "              )\n",
       "              (o_proj): lora.Linear(\n",
       "                (base_layer): Linear(in_features=2048, out_features=2304, bias=False)\n",
       "                (lora_dropout): ModuleDict(\n",
       "                  (default): Dropout(p=0.05, inplace=False)\n",
       "                )\n",
       "                (lora_A): ModuleDict(\n",
       "                  (default): Linear(in_features=2048, out_features=16, bias=False)\n",
       "                )\n",
       "                (lora_B): ModuleDict(\n",
       "                  (default): Linear(in_features=16, out_features=2304, bias=False)\n",
       "                )\n",
       "                (lora_embedding_A): ParameterDict()\n",
       "                (lora_embedding_B): ParameterDict()\n",
       "                (lora_magnitude_vector): ModuleDict()\n",
       "              )\n",
       "              (rotary_emb): Gemma2RotaryEmbedding()\n",
       "            )\n",
       "            (mlp): Gemma2MLP(\n",
       "              (gate_proj): lora.Linear(\n",
       "                (base_layer): Linear(in_features=2304, out_features=9216, bias=False)\n",
       "                (lora_dropout): ModuleDict(\n",
       "                  (default): Dropout(p=0.05, inplace=False)\n",
       "                )\n",
       "                (lora_A): ModuleDict(\n",
       "                  (default): Linear(in_features=2304, out_features=16, bias=False)\n",
       "                )\n",
       "                (lora_B): ModuleDict(\n",
       "                  (default): Linear(in_features=16, out_features=9216, bias=False)\n",
       "                )\n",
       "                (lora_embedding_A): ParameterDict()\n",
       "                (lora_embedding_B): ParameterDict()\n",
       "                (lora_magnitude_vector): ModuleDict()\n",
       "              )\n",
       "              (up_proj): lora.Linear(\n",
       "                (base_layer): Linear(in_features=2304, out_features=9216, bias=False)\n",
       "                (lora_dropout): ModuleDict(\n",
       "                  (default): Dropout(p=0.05, inplace=False)\n",
       "                )\n",
       "                (lora_A): ModuleDict(\n",
       "                  (default): Linear(in_features=2304, out_features=16, bias=False)\n",
       "                )\n",
       "                (lora_B): ModuleDict(\n",
       "                  (default): Linear(in_features=16, out_features=9216, bias=False)\n",
       "                )\n",
       "                (lora_embedding_A): ParameterDict()\n",
       "                (lora_embedding_B): ParameterDict()\n",
       "                (lora_magnitude_vector): ModuleDict()\n",
       "              )\n",
       "              (down_proj): lora.Linear(\n",
       "                (base_layer): Linear(in_features=9216, out_features=2304, bias=False)\n",
       "                (lora_dropout): ModuleDict(\n",
       "                  (default): Dropout(p=0.05, inplace=False)\n",
       "                )\n",
       "                (lora_A): ModuleDict(\n",
       "                  (default): Linear(in_features=9216, out_features=16, bias=False)\n",
       "                )\n",
       "                (lora_B): ModuleDict(\n",
       "                  (default): Linear(in_features=16, out_features=2304, bias=False)\n",
       "                )\n",
       "                (lora_embedding_A): ParameterDict()\n",
       "                (lora_embedding_B): ParameterDict()\n",
       "                (lora_magnitude_vector): ModuleDict()\n",
       "              )\n",
       "              (act_fn): PytorchGELUTanh()\n",
       "            )\n",
       "            (input_layernorm): Gemma2RMSNorm((2304,), eps=1e-06)\n",
       "            (pre_feedforward_layernorm): Gemma2RMSNorm((2304,), eps=1e-06)\n",
       "            (post_feedforward_layernorm): Gemma2RMSNorm((2304,), eps=1e-06)\n",
       "            (post_attention_layernorm): Gemma2RMSNorm((2304,), eps=1e-06)\n",
       "          )\n",
       "        )\n",
       "        (norm): Gemma2RMSNorm((2304,), eps=1e-06)\n",
       "      )\n",
       "      (lm_head): lora.Linear(\n",
       "        (base_layer): Linear(in_features=2304, out_features=256006, bias=False)\n",
       "        (lora_dropout): ModuleDict(\n",
       "          (default): Dropout(p=0.05, inplace=False)\n",
       "        )\n",
       "        (lora_A): ModuleDict(\n",
       "          (default): Linear(in_features=2304, out_features=16, bias=False)\n",
       "        )\n",
       "        (lora_B): ModuleDict(\n",
       "          (default): Linear(in_features=16, out_features=256006, bias=False)\n",
       "        )\n",
       "        (lora_embedding_A): ParameterDict()\n",
       "        (lora_embedding_B): ParameterDict()\n",
       "        (lora_magnitude_vector): ModuleDict()\n",
       "      )\n",
       "    )\n",
       "  )\n",
       ")"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from peft import PeftModel, PeftConfig\n",
    "from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig\n",
    "from datasets import load_dataset\n",
    "import torch\n",
    "\n",
    "bnb_config = BitsAndBytesConfig(\n",
    "            load_in_4bit=True,\n",
    "            bnb_4bit_quant_type=\"nf4\",\n",
    "            bnb_4bit_compute_dtype=torch.bfloat16,\n",
    "            bnb_4bit_use_double_quant=True,\n",
    "        )\n",
    "\n",
    "peft_model_id = f\"username/{output_dir}\" # replace with your newly trained adapter\n",
    "device = \"auto\"\n",
    "config = PeftConfig.from_pretrained(peft_model_id)\n",
    "model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path,\n",
    "                                             device_map=\"auto\",\n",
    "                                             )\n",
    "tokenizer = AutoTokenizer.from_pretrained(peft_model_id)\n",
    "model.resize_token_embeddings(len(tokenizer))\n",
    "model = PeftModel.from_pretrained(model, peft_model_id)\n",
    "model.to(torch.bfloat16)\n",
    "model.eval()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "69e83af9-f967-4e5a-842b-0daed13f7957",
   "metadata": {
    "id": "69e83af9-f967-4e5a-842b-0daed13f7957",
    "outputId": "979b2ee9-fe5b-49b1-aed5-e28f0239a709",
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<bos><start_of_turn>human\n",
      "You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags.You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.Here are the available tools:<tools> [{'type': 'function', 'function': {'name': 'convert_currency', 'description': 'Convert from one currency to another', 'parameters': {'type': 'object', 'properties': {'amount': {'type': 'number', 'description': 'The amount to convert'}, 'from_currency': {'type': 'string', 'description': 'The currency to convert from'}, 'to_currency': {'type': 'string', 'description': 'The currency to convert to'}}, 'required': ['amount', 'from_currency', 'to_currency']}}}, {'type': 'function', 'function': {'name': 'calculate_distance', 'description': 'Calculate the distance between two locations', 'parameters': {'type': 'object', 'properties': {'start_location': {'type': 'string', 'description': 'The starting location'}, 'end_location': {'type': 'string', 'description': 'The ending location'}}, 'required': ['start_location', 'end_location']}}}] </tools>Use the following pydantic model json schema for each tool call you will make: {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']}For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:\n",
      "<tool_call>\n",
      "{tool_call}\n",
      "</tool_call>Also, before making a call to a function take the time to plan the function to take. Make that thinking process between <think>{your thoughts}</think>\n",
      "\n",
      "Hi, I need to convert 500 USD to Euros. Can you help me with that?<end_of_turn><eos>\n",
      "<start_of_turn>model\n",
      "<think>Alright, so I'm looking at this scenario where the user needs help converting 500 USD to Euros. Let's break it down step by step to understand how to approach this.\n",
      "\n",
      "First, the user has asked for conversion from USD to EUR. I know there's a specific tool available for currency conversion called `convert_currency`. This tool requires three parameters: amount, from_currency, and to_currency. In the user's message, they've provided all necessary details—500 USD and the desire to convert to Euros. \n",
      "\n",
      "I don't need to make any assumptions here because the user has already provided the exact amount and the target currency. My next step is to prepare the function call with the correct parameters. So, I'll structure the request with the amount as 500, from_currency as USD, and to_currency as EUR. \n",
      "\n",
      "This will be sent in JSON format as specified, ensuring that the tool receives all the necessary information to perform the conversion accurately. Once the function is called, it should return the converted amount, which I can then present back to the user.\n",
      "</think><tool_call>\n",
      "{'name': 'convert_currency', 'arguments': {'amount': 500, 'from_currency': 'USD', 'to_currency': 'EUR'}}\n",
      "</tool_call><end_of_turn><eos>\n",
      "<start_of_turn>tool\n",
      "<tool_response>\n",
      "{'converted_amount': 420.5, 'from_currency': 'USD', 'to_currency': 'EUR'}\n",
      "</tool_response><end_of_turn><eos>\n",
      "<start_of_turn>model\n",
      "Sure, 500 USD is approximately 420.50 Euros.<end_of_turn><eos>\n",
      "<start_of_turn>human\n",
      "That's great! Now, can you convert 1000 Euros back to USD?<end_of_turn><eos>\n",
      "<start_of_turn>model\n",
      "<think>Okay, so I'm trying to figure out how to handle the next function call based on the conversation and the previous steps. Let me see...\n",
      "\n",
      "In the conversation, the user first asked to convert 500 USD to Euros, and the model used the `convert_currency` function with the parameters amount=500, from_currency=USD, to_currency=EUR. Then, after getting the response, the model provided the converted amount. Now, the user is asking to convert 1000 Euros back to USD. \n",
      "\n",
      "So, the user's request is clear: convert 1000 EUR to USD. Since the function `convert_currency` is designed to handle currency conversions, and it requires amount, from_currency, and to_currency as parameters, it makes sense to call this function again with the updated values. \n",
      "\n",
      "The amount is 1000, from_currency should be EUR because the user is converting from Euros, and to_currency should be USD because they want the result in US Dollars. I don't need to make any assumptions here because all the required information is provided by the user. \n",
      "\n",
      "I'll structure the function call with these parameters. This should give the user the accurate converted amount from Euros to USD. Once the tool processes this, the model can present the result back to the user.\n",
      "</think><tool_call>\n",
      "{'name': 'convert_currency', 'arguments': {'amount': 1000, 'from_currency': 'EUR', 'to_currency': 'USD'}}\n",
      "</tool_call><end_of_turn><eos>\n",
      "<start_of_turn>tool\n",
      "<tool_response>\n",
      "{'converted_amount': 1190.0, 'from_currency': 'EUR', 'to_currency': 'USD'}\n",
      "</tool_response><end_of_turn><eos>\n",
      "<start_of_turn>model\n",
      "1000 Euros is approximately 1190.00 USD.<end_of_turn><eos>\n",
      "<start_of_turn>human\n",
      "Thank you for your help!<end_of_turn><eos>\n",
      "<start_of_turn>model\n",
      "You're welcome! If you have any other questions, feel free to ask.<end_of_turn><eos>\n",
      "\n"
     ]
    }
   ],
   "source": [
    "print(dataset[\"test\"][8][\"content\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b47fd511-ea00-47ce-8618-6e78e25672b2",
   "metadata": {
    "id": "b47fd511-ea00-47ce-8618-6e78e25672b2"
   },
   "source": [
    "### Testing the model 🚀\n",
    "\n",
    "In that case, we will take the start of one of the samples from the test set and hope that it will generate the expected output.\n",
    "\n",
    "Since we want to test the function-calling capacities of our newly fine-tuned model, the input will be a user message with the available tools, a\n",
    "\n",
    "\n",
    "### Disclaimer ⚠️\n",
    "\n",
    "The dataset we’re using **does not contain sufficient training data** and is purely for **educational purposes**. As a result, **your trained model’s outputs may differ** from the examples shown in this course. **Don’t be discouraged** if your results vary—our primary goal here is to illustrate the core concepts rather than produce a fully optimized or production-ready model.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "37bf938d-08fa-4577-9966-0238339afcdb",
   "metadata": {
    "id": "37bf938d-08fa-4577-9966-0238339afcdb",
    "outputId": "e97e7a1e-5ab2-46a2-dc3a-f436964fe004",
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<bos><start_of_turn>human\n",
      "You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags.You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.Here are the available tools:<tools> [{'type': 'function', 'function': {'name': 'convert_currency', 'description': 'Convert from one currency to another', 'parameters': {'type': 'object', 'properties': {'amount': {'type': 'number', 'description': 'The amount to convert'}, 'from_currency': {'type': 'string', 'description': 'The currency to convert from'}, 'to_currency': {'type': 'string', 'description': 'The currency to convert to'}}, 'required': ['amount', 'from_currency', 'to_currency']}}}, {'type': 'function', 'function': {'name': 'calculate_distance', 'description': 'Calculate the distance between two locations', 'parameters': {'type': 'object', 'properties': {'start_location': {'type': 'string', 'description': 'The starting location'}, 'end_location': {'type': 'string', 'description': 'The ending location'}}, 'required': ['start_location', 'end_location']}}}] </tools>Use the following pydantic model json schema for each tool call you will make: {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']}For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:\n",
      "<tool_call>\n",
      "{tool_call}\n",
      "</tool_call>Also, before making a call to a function take the time to plan the function to take. Make that thinking process between <think>{your thoughts}</think>\n",
      "\n",
      "Hi, I need to convert 500 USD to Euros. Can you help me with that?<end_of_turn><eos>\n",
      "<start_of_turn>model\n",
      "<think>Okay, so the user is asking to convert 500 USD to Euros. I need to figure out how to respond. Looking at the available tools, there's a function called convert_currency which does exactly that. It takes an amount, the source currency, and the target currency. The user provided all the necessary details: 500, USD, and EUR. So, I should call convert_currency with these parameters. That should give the user the converted amount they need.\n",
      "</think><tool_call>\n",
      "{'name': 'convert_currency', 'arguments': {'amount': 500, 'from_currency': 'USD', 'to_currency': 'EUR'}}\n",
      "</tool_call><end_of_turn><eos>\n"
     ]
    }
   ],
   "source": [
    "#this prompt is a sub-sample of one of the test set examples. In this example we start the generation after the model generation starts.\n",
    "prompt=\"\"\"<bos><start_of_turn>human\n",
    "You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags.You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.Here are the available tools:<tools> [{'type': 'function', 'function': {'name': 'convert_currency', 'description': 'Convert from one currency to another', 'parameters': {'type': 'object', 'properties': {'amount': {'type': 'number', 'description': 'The amount to convert'}, 'from_currency': {'type': 'string', 'description': 'The currency to convert from'}, 'to_currency': {'type': 'string', 'description': 'The currency to convert to'}}, 'required': ['amount', 'from_currency', 'to_currency']}}}, {'type': 'function', 'function': {'name': 'calculate_distance', 'description': 'Calculate the distance between two locations', 'parameters': {'type': 'object', 'properties': {'start_location': {'type': 'string', 'description': 'The starting location'}, 'end_location': {'type': 'string', 'description': 'The ending location'}}, 'required': ['start_location', 'end_location']}}}] </tools>Use the following pydantic model json schema for each tool call you will make: {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']}For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:\n",
    "<tool_call>\n",
    "{tool_call}\n",
    "</tool_call>Also, before making a call to a function take the time to plan the function to take. Make that thinking process between <think>{your thoughts}</think>\n",
    "\n",
    "Hi, I need to convert 500 USD to Euros. Can you help me with that?<end_of_turn><eos>\n",
    "<start_of_turn>model\n",
    "<think>\"\"\"\n",
    "\n",
    "inputs = tokenizer(prompt, return_tensors=\"pt\", add_special_tokens=False)\n",
    "inputs = {k: v.to(\"cuda\") for k,v in inputs.items()}\n",
    "outputs = model.generate(**inputs,\n",
    "                         max_new_tokens=300,\n",
    "                         do_sample=True,\n",
    "                         top_p=0.65,\n",
    "                         temperature=0.01,\n",
    "                         repetition_penalty=1.0,\n",
    "                         eos_token_id=tokenizer.eos_token_id)\n",
    "print(tokenizer.decode(outputs[0]))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "xWewPCZOyfJQ",
   "metadata": {
    "id": "xWewPCZOyfJQ"
   },
   "source": [
    "## Congratulations\n",
    "Congratulations on finishing this first Bonus Unit 🥳\n",
    "\n",
    "You've just **mastered what Function-Calling is and how to fine-tune your model to do Function-Calling**!\n",
    "\n",
    "If it's the first time you do this, it's normal that you're feeling puzzled. Take time to check the documentation and understand each part of the code and why we did it this way.\n",
    "\n",
    "Also, don't hesitate to try to **fine-tune different models**. The **best way to learn is by trying.**\n",
    "\n",
    "### Keep Learning, Stay Awesome 🤗"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}