---
library_name: transformers
datasets:
- teknium/openhermes
pipeline_tag: text-generation
license: apache-2.0
base_model: Na0s/Llama-3.1-8B-Pruned-4-Layers_LoRA-PEFT-2.0
---
# Model Card for Na0s/Llama-3.1-8B-Pruned-4-Layers_LoRA-PEFT-3.0:
## Model Details:
### Model Description:
- **Finetuned from model: Na0s/Llama-3.1-8B-Pruned-4-Layers_LoRA-PEFT-2.0 on teknium/openhermes.**
- We pruned the 4 layers of meta-llama/Meta-Llama-3.1-8B that had the less impact on the performance of the model according to the paper [The Unreasonable Ineffectiveness of the Deeper Layers](https://arxiv.org/pdf/2403.17887).
- We have therefore 1.09B parameters less than the foundation model, which means less memory needed, faster training and less latency during inference mode.
- We then recovered the performance loss induced by the pruning process by fine-tuning (from 0.2642 MMLU-Pro 0-shot to 0.3120), this step is called healing the pruned model.
### Upcoming Work:
- More healing through SFT/DPO/TPO to see if we can get closer to the meta-llama/Meta-Llama-3.1-8B performance (which has an MMLU-Pro 0-shot of 0.3659 vs 0.3120 for our model). **(In Progress)**
- Compare the same exact process when applied to meta-llama/LLama-3.1-70B.
### Training Details:
model = FastLanguageModel.get_peft_model(
model,
r = 4,
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 4,
lora_dropout = 0.05,
bias = "none",
use_gradient_checkpointing = "unsloth",
random_state = 3407,
use_rslora = False,
loftq_config = None,
)
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported
trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = dataset,
dataset_text_field = "completion",
max_seq_length = max_seq_length,
dataset_num_proc = 2,
packing = False,
args = TrainingArguments(
per_device_train_batch_size = 10,
gradient_accumulation_steps = 4,
warmup_steps = 5,
max_steps=5000,
learning_rate = 2e-4,
fp16 = not is_bfloat16_supported(),
bf16 = is_bfloat16_supported(),
logging_steps = 1,
optim = "adamw_8bit",
weight_decay = 0.01,
lr_scheduler_type = "cosine",
seed = 3407,
output_dir = "outputs_4",
push_to_hub=True,
hub_always_push=True,
),
)
### Training Data:
[teknium/openhermes](https://huggingface.co/datasets/teknium/openhermes)
### Memory and Latency gain (Using [**Optimum-Benchmark**](https://github.com/huggingface/optimum-benchmark)):
**Load Mode Memory Metrics**
| **Model** | **Max Global VRAM (MB)** | **Max Process VRAM (MB)** | **Max Reserved VRAM (MB)** | **Max Allocated VRAM (MB)** |
|:--------------------------------------------------:|:------------------------:|:-------------------------:|:--------------------------:|:---------------------------:|
| Llama-3.1-8B | 18521.98 | 16630.42 | 16196.30 | 16060.54 |
| Na0s/Llama-3.1-8B-Pruned-4-Layers_LoRA-PEFT-3.0 | 16319.97 | 14428.41 | 13994.30 | 13879.42 |
**Inference Mode Latency Metrics**
| **Model** | **Latency Mean (s)** | **Throughput (tokens/s)** |
|:--------------------------------------------------:|:--------------------:|:-------------------------:|
| Llama-3.1-8B | 0.8104 | 38.2536 |
| Na0s/Llama-3.1-8B-Pruned-4-Layers_LoRA-PEFT-3.0 | 0.5530 | 56.0570 |
### Evaluation:
- (Foundation model) MMLU Pro 0-shot of meta-llama/Meta-Llama-3.1-8B: 0.3659
- (Pruned model) MMLU Pro 0-shot of Na0s/Llama-3.1-8B-Pruned-4-Layers: 0.2642
- (Healed model) MMLU Pro 0-shot of Na0s/Llama-3.1-8B-Pruned-4-Layers_LoRA-PEFT-3.0: 0.3120
### Evaluation Data and Process:
- [TIGER-AI-Lab/MMLU-Pro](https://github.com/TIGER-AI-Lab/MMLU-Pro).
- HuggingFace Lighteval benchmarking repo.
## Additional Benchmark Results
### BoolQ 0-shots Benchmark Results
| Model | Average Score | boolq (0 shots) | boolq contrastset (0 shots) |
|-------|---------------|-----------------|---------------------------|
| meta-llama/Meta-Llama-3.1-8B | 0.569 | 0.569 | 0.568 |
| Na0s/Llama-3.1-8B-Pruned-4-Layers | 0.240 | 0.240 | 0.240 |
| Na0s/Llama-3.1-8B-Pruned-4-Layers_LoRA-PEFT-3.0 | **0.833** | **0.834** | **0.831** |
### BigBench 0-shots Benchmark Results
| Model | Average Score | bigbench:causal_judgment (0 shots) | bigbench:date_understanding (0 shots) | bigbench:disambiguation_qa (0 shots) | bigbench:geometric_shapes (0 shots) | bigbench:logical_deduction (0 shots) | ... |
|-------|---------------|-------------------------------------|---------------------------------------|--------------------------------------|-------------------------------------|--------------------------------------|--------------------------------------|
| meta-llama/Meta-Llama-3.1-8B | **0.351** | **0.574** | 0.499 | 0.302 | 0.164 | 0.208 | ... |
| Na0s/Llama-3.1-8B-Pruned-4-Layers | 0.299 | 0.537 | 0.341 | 0.314 | 0.200 | **0.212** | ... |
| Na0s/Llama-3.1-8B-Pruned-4-Layers_LoRA-PEFT-3.0 | **0.364** | **0.579** | **0.610** | **0.407** | **0.264** | 0.208 | ... |
### Few Shots Benchmark Results
| Model | Average Score | arc:challenge (25 shots) | hellaswag (10 shots) | mmlu:abstract_algebra (5 shots) | mmlu:college_chemistry (5 shots) | mmlu:college_computer_science (5 shots) | mmlu:college_mathematics (5 shots) | ... |
|-------|---------------|--------------------------|----------------------|--------------------------------|----------------------------------|----------------------------------------|-----------------------------------|-----------------------------------|
| meta-llama/Meta-Llama-3.1-8B | **0.552** | **0.541** | **0.620** | 0.290 | 0.450 | 0.480 | **0.350** | ... |
| Na0s/Llama-3.1-8B-Pruned-4-Layers | 0.516 | 0.462 | 0.549 | 0.290 | 0.440 | 0.460 | 0.280 | ... |
| Na0s/Llama-3.1-8B-Pruned-4-Layers_LoRA-PEFT-3.0 | 0.544 | 0.479 | 0.554 | **0.340** | **0.480** | **0.520** | **0.350** | ... |
### BigBench 3-shots Benchmark Results
| Model | Average Score | bigbench:causal_judgment (3 shots) | bigbench:date_understanding (3 shots) | bigbench:disambiguation_qa (3 shots) | bigbench:geometric_shapes (3 shots) | bigbench:logical_deduction (3 shots) | ... |
|-------|---------------|-------------------------------------|---------------------------------------|--------------------------------------|-------------------------------------|--------------------------------------|--------------------------------------|
| meta-llama/Meta-Llama-3.1-8B | 0.442 | 0.563 | 0.596 | 0.593 | 0.181 | 0.298 | ... |
| Na0s/Llama-3.1-8B-Pruned-4-Layers | 0.420 | 0.563 | 0.642 | 0.574 | 0.217 | 0.258 | ... |
| Na0s/Llama-3.1-8B-Pruned-4-Layers_LoRA-PEFT-3.0 | **0.450** | **0.621** | **0.686** | **0.663** | **0.225** | **0.332** | ... |
### Overall Average Score
| Model | Overall Average Score |
|-------|------------------------|
| meta-llama/Meta-Llama-3.1-8B | 0.472 |
| Na0s/Llama-3.1-8B-Pruned-4-Layers | 0.364 |
| Na0s/Llama-3.1-8B-Pruned-4-Layers_LoRA-PEFT-3.0 | **0.513** |
### Environmental Impact:
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).