SURESHBEEKHANI
/

Llama_3_2_3B_SFT_GGUF

@@ -1,22 +1,166 @@
 ---
-base_model: unsloth/llama-3.2-3b-instruct-bnb-4bit
-tags:
-- text-generation-inference
-- transformers
-- unsloth
-- llama
-- gguf
-license: apache-2.0
-language:
-- en
 ---
-# Uploaded  model
-- **Developed by:** SURESHBEEKHANI
-- **License:** apache-2.0
-- **Finetuned from model :** unsloth/llama-3.2-3b-instruct-bnb-4bit
-This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

+# **Fine-Tuning Meta-Llama-3.2-3B with Unsloth for CPU and GPU Inference - GGML**
+## **Overview**
+On **September 25, 2024**, Meta released the **Llama 3.2** series, featuring highly optimized multilingual language models in 1B and 3B parameter configurations. These models excel in multilingual dialogue tasks, summarization, and agentic retrieval, supporting extensive text processing with a **128K token context length**.
+This repository demonstrates fine-tuning the **Meta-Llama-3.2-3B** model using **Unsloth** for efficient training and inference. It also includes steps to convert the model into **GGML format**, enabling memory-efficient deployment on CPUs and GPUs.
+---
+## **Table of Contents**
+1. [Key Features](#key-features)
+2. [Setup and Installation](#setup-and-installation)
+3. [Fine-Tuning Workflow](#fine-tuning-workflow)
+4. [Data Preparation](#data-preparation)
+5. [Training the Model](#training-the-model)
+6. [Model Conversion to GGML](#model-conversion-to-ggml)
+---
+## **Key Features**
+- **Low-Rank Adaptation (LoRA):** Enables efficient parameter fine-tuning, reducing training costs.
+- **Memory Optimization:** Supports **4-bit quantization** for memory-constrained environments.
+- **Fast Processing:** Includes gradient checkpointing and optimized data handling for faster inference.
+- **Extended Context Length:** Handles input sequences up to **128K tokens** for large document processing.
+- **Versatile Applications:** Ideal for dialogue systems, summarization, and knowledge retrieval tasks.
+---
+## **Setup and Installation**
+### **Step 1: Install Dependencies**
+Install the necessary packages, including the latest version of **Unsloth** for enhanced fine-tuning efficiency.
+```bash
+%%capture
+!pip install unsloth
+# Install the latest nightly version of Unsloth
+!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git
+```
+---
+### **Step 2: Load the Model and Tokenizer**
+The following code initializes the Llama-3.2 model and tokenizer:
+```python
+from unsloth import FastLanguageModel
+import torch
+# Configuration settings
+max_seq_length = 2048  # Maximum sequence length
+dtype = None  # Automatically detects dtype; Float16 for T4, Bfloat16 for Ampere+
+load_in_4bit = True  # Use 4-bit quantization for memory efficiency
+# Load the model and tokenizer
+model, tokenizer = FastLanguageModel.from_pretrained(
+    model_name="unsloth/Llama-3.2-3B-Instruct",
+    max_seq_length=max_seq_length,
+    dtype=dtype,
+    load_in_4bit=load_in_4bit,
+)
+```
+---
+## **Fine-Tuning Workflow**
+### **LoRA Fine-Tuning with Unsloth**
+Use LoRA adapters to fine-tune only a small subset of model parameters:
+```python
+model = FastLanguageModel.get_peft_model(
+    model,
+    r=16,  # Rank for LoRA; options: 8, 16, 32, etc.
+    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
+    lora_alpha=16,
+    lora_dropout=0,
+    bias="none",
+    use_gradient_checkpointing="unsloth",  # Enable optimized checkpointing
+    random_state=3407,
+)
+```
 ---
+## **Data Preparation**
+Prepare your dataset in **ShareGPT-style** conversation format using the `unsloth.chat_templates` module:
+```python
+from unsloth.chat_templates import get_chat_template
+from datasets import load_dataset
+# Apply the chat template
+tokenizer = get_chat_template(tokenizer, chat_template="llama-3.1")
+def formatting_prompts_func(examples):
+    convos = examples["conversations"]
+    texts = [
+        tokenizer.apply_chat_template(convo, tokenize=False, add_generation_prompt=False)
+        for convo in convos
+    ]
+    return {"text": texts}
+# Load and prepare the dataset
+dataset = load_dataset("mlabonne/FineTome-100k", split="train")
+dataset = dataset.select(range(500))  # Use a subset for quick testing
+from unsloth.chat_templates import standardize_sharegpt
+dataset = standardize_sharegpt(dataset)
+dataset = dataset.map(formatting_prompts_func, batched=True)
+```
 ---
+## **Training the Model**
+### **SFT Training with TRL**
+Fine-tune the model using Hugging Face's TRL library:
+```python
+from trl import SFTTrainer
+from transformers import TrainingArguments, DataCollatorForSeq2Seq
+from unsloth import is_bfloat16_supported
+trainer = SFTTrainer(
+    model=model,
+    tokenizer=tokenizer,
+    train_dataset=dataset,
+    dataset_text_field="text",
+    max_seq_length=max_seq_length,
+    data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer),
+    dataset_num_proc=2,
+    packing=False,
+    args=TrainingArguments(
+        per_device_train_batch_size=2,
+        gradient_accumulation_steps=4,
+        warmup_steps=5,
+        max_steps=60,
+        learning_rate=2e-4,
+        fp16=not is_bfloat16_supported(),
+        bf16=is_bfloat16_supported(),
+        logging_steps=1,
+        optim="adamw_8bit",
+        weight_decay=0.01,
+        lr_scheduler_type="linear",
+        seed=3407,
+        output_dir="outputs",
+        report_to="none",
+    ),
+)
+# Train on assistant responses only
+from unsloth.chat_templates import train_on_responses_only
+trainer = train_on_responses_only(
+    trainer,
+    instruction_part="<|start_header_id|>user<|end_header_id|>\n\n",
+    response_part="<|start_header_id|>assistant<|end_header_id|>\n\n",
+)
+```
+---
+## **Model Conversion to GGML**
+Convert the fine-tuned model into GGML format for memory-efficient inference:
+```bash
+python -m unsloth.export_ggml --model outputs --output llama3.2-3b.ggml
+```
+---
+## **License**
+This project is distributed under the Apache License 2.0. See [LICENSE](LICENSE) for more details.