---
library_name: transformers
datasets:
- teknium/openhermes
pipeline_tag: text-generation
license: apache-2.0
base_model: Na0s/Llama-3.1-8B-Pruned-4-Layers_LoRA-PEFT-2.0
---

<a href="https://ibb.co/xGnXR5t"><img src="https://i.ibb.co/P42FfCv/DALL-E-2024-08-08-05-21-39-An-artistic-representation-for-a-model-card-featuring-an-abstract-and-sty.webp" alt="DALL-E-2024-08-08-05-21-39-An-artistic-representation-for-a-model-card-featuring-an-abstract-and-sty" border="0"></a>

# Model Card for Na0s/Llama-3.1-8B-Pruned-4-Layers_LoRA-PEFT-3.0:
## Model Details:

### Model Description:
- **Finetuned from model: Na0s/Llama-3.1-8B-Pruned-4-Layers_LoRA-PEFT-2.0 on teknium/openhermes.** 
- We pruned the 4 layers of meta-llama/Meta-Llama-3.1-8B that had the less impact on the performance of the model according to the paper [The Unreasonable Ineffectiveness of the Deeper Layers](https://arxiv.org/pdf/2403.17887).
- We have therefore 1.09B parameters less than the foundation model, which means less memory needed, faster training and less latency during inference mode.
- We then recovered the performance loss induced by the pruning process by fine-tuning (from 0.2642 MMLU-Pro 0-shot to 0.3120), this step is called healing the pruned model.

### Upcoming Work:
- More healing through SFT/DPO/TPO to see if we can get closer to the meta-llama/Meta-Llama-3.1-8B performance (which has an MMLU-Pro 0-shot of 0.3659 vs 0.3120 for our model). **(In Progress)**
- Compare the same exact process when applied to meta-llama/LLama-3.1-70B.

### Training Details:

    model = FastLanguageModel.get_peft_model(
    model,
    r = 4, 
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 4,
    lora_dropout = 0.05, 
    bias = "none",    
   
    use_gradient_checkpointing = "unsloth", 
    random_state = 3407,
    use_rslora = False,  
    loftq_config = None, 
    )

    from trl import SFTTrainer
    from transformers import TrainingArguments
    from unsloth import is_bfloat16_supported

    trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "completion",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, 
    args = TrainingArguments(
        per_device_train_batch_size = 10,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps=5000,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "cosine",
        seed = 3407,
        output_dir = "outputs_4",
        push_to_hub=True,
        hub_always_push=True,
    ),
    )

### Training Data:

[teknium/openhermes](https://huggingface.co/datasets/teknium/openhermes)

### Memory and Latency gain (Using [**Optimum-Benchmark**](https://github.com/huggingface/optimum-benchmark)):
**Load Mode Memory Metrics**
| **Model**                                          | **Max Global VRAM (MB)** | **Max Process VRAM (MB)** | **Max Reserved VRAM (MB)** | **Max Allocated VRAM (MB)** |
|:--------------------------------------------------:|:------------------------:|:-------------------------:|:--------------------------:|:---------------------------:|
| Llama-3.1-8B                                       | 18521.98                 | 16630.42                  | 16196.30                   | 16060.54                    |
| Na0s/Llama-3.1-8B-Pruned-4-Layers_LoRA-PEFT-3.0    | 16319.97                 | 14428.41                  | 13994.30                   | 13879.42                    |

**Inference Mode Latency Metrics**
| **Model**                                          | **Latency Mean (s)** | **Throughput (tokens/s)** |
|:--------------------------------------------------:|:--------------------:|:-------------------------:|
| Llama-3.1-8B                                       | 0.8104               | 38.2536                   |
| Na0s/Llama-3.1-8B-Pruned-4-Layers_LoRA-PEFT-3.0    | 0.5530               | 56.0570                   |

### Evaluation:

- (Foundation model) MMLU Pro 0-shot of meta-llama/Meta-Llama-3.1-8B: 0.3659
- (Pruned model) MMLU Pro 0-shot of Na0s/Llama-3.1-8B-Pruned-4-Layers: 0.2642
- (Healed model) MMLU Pro 0-shot of Na0s/Llama-3.1-8B-Pruned-4-Layers_LoRA-PEFT-3.0: 0.3120 

<a href="https://ibb.co/6ngkh8w"><img src="https://i.ibb.co/dWGR9tM/Screenshot-2024-08-08-at-7-41-26-AM.png" alt="Screenshot-2024-08-08-at-7-41-26-AM" border="0" width=600 height=600></a>


### Evaluation Data and Process:

- [TIGER-AI-Lab/MMLU-Pro](https://github.com/TIGER-AI-Lab/MMLU-Pro).
- HuggingFace Lighteval benchmarking repo.


## Additional Benchmark Results

### BoolQ 0-shots Benchmark Results

| Model | Average Score | boolq (0 shots) | boolq contrastset (0 shots) |
|-------|---------------|-----------------|---------------------------|
| meta-llama/Meta-Llama-3.1-8B | 0.569 | 0.569 | 0.568 |
| Na0s/Llama-3.1-8B-Pruned-4-Layers | 0.240 | 0.240 | 0.240 |
| Na0s/Llama-3.1-8B-Pruned-4-Layers_LoRA-PEFT-3.0 | **0.833** | **0.834** | **0.831** |

### BigBench 0-shots Benchmark Results

| Model | Average Score | bigbench:causal_judgment (0 shots) | bigbench:date_understanding (0 shots) | bigbench:disambiguation_qa (0 shots) | bigbench:geometric_shapes (0 shots) | bigbench:logical_deduction (0 shots) | ... |
|-------|---------------|-------------------------------------|---------------------------------------|--------------------------------------|-------------------------------------|--------------------------------------|--------------------------------------|
| meta-llama/Meta-Llama-3.1-8B | **0.351** | **0.574** | 0.499 | 0.302 | 0.164 | 0.208 | ... |
| Na0s/Llama-3.1-8B-Pruned-4-Layers | 0.299 | 0.537 | 0.341 | 0.314 | 0.200 | **0.212** | ... |
| Na0s/Llama-3.1-8B-Pruned-4-Layers_LoRA-PEFT-3.0 | **0.364** | **0.579** | **0.610** | **0.407** | **0.264** | 0.208 | ... |

### Few Shots Benchmark Results

| Model | Average Score | arc:challenge (25 shots) | hellaswag (10 shots) | mmlu:abstract_algebra (5 shots) | mmlu:college_chemistry (5 shots) | mmlu:college_computer_science (5 shots) | mmlu:college_mathematics (5 shots) | ... |
|-------|---------------|--------------------------|----------------------|--------------------------------|----------------------------------|----------------------------------------|-----------------------------------|-----------------------------------|
| meta-llama/Meta-Llama-3.1-8B | **0.552** | **0.541** | **0.620** | 0.290 | 0.450 | 0.480 | **0.350** | ... |
| Na0s/Llama-3.1-8B-Pruned-4-Layers | 0.516 | 0.462 | 0.549 | 0.290 | 0.440 | 0.460 | 0.280 | ... |
| Na0s/Llama-3.1-8B-Pruned-4-Layers_LoRA-PEFT-3.0 | 0.544 | 0.479 | 0.554 | **0.340** | **0.480** | **0.520** | **0.350** | ... |

### BigBench 3-shots Benchmark Results

| Model | Average Score | bigbench:causal_judgment (3 shots) | bigbench:date_understanding (3 shots) | bigbench:disambiguation_qa (3 shots) | bigbench:geometric_shapes (3 shots) | bigbench:logical_deduction (3 shots) | ... |
|-------|---------------|-------------------------------------|---------------------------------------|--------------------------------------|-------------------------------------|--------------------------------------|--------------------------------------|
| meta-llama/Meta-Llama-3.1-8B | 0.442 | 0.563 | 0.596 | 0.593 | 0.181 | 0.298 | ... |
| Na0s/Llama-3.1-8B-Pruned-4-Layers | 0.420 | 0.563 | 0.642 | 0.574 | 0.217 | 0.258 | ... |
| Na0s/Llama-3.1-8B-Pruned-4-Layers_LoRA-PEFT-3.0 | **0.450** | **0.621** | **0.686** | **0.663** | **0.225** | **0.332** | ... |

### Overall Average Score

| Model | Overall Average Score |
|-------|------------------------|
| meta-llama/Meta-Llama-3.1-8B | 0.472 |
| Na0s/Llama-3.1-8B-Pruned-4-Layers | 0.364 |
| Na0s/Llama-3.1-8B-Pruned-4-Layers_LoRA-PEFT-3.0 | **0.513** |


### Environmental Impact:

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).