File size: 2,406 Bytes
a1b1650 de98162 2fd4aec 98d5636 2fd4aec 50d2534 b38ac29 50d2534 de98162 98d5636 de98162 98d5636 de98162 98d5636 de98162 98d5636 cabcb85 98d5636 8ad4598 98d5636 51eba52 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
---
base_model:
- meta-llama/Meta-Llama-3-8B-Instruct
pipeline_tag: text-generation
metrics:
- accuracy
---
# Model Description:
Pruned from [`meta-llama/Meta-Llama-3-8B-Instruct`](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)
using the Random Pruner from [`LLM-Pruner: On the Structural Pruning of Large Language Models`](https://arxiv.org/abs/2305.11627)
Done to test viability of LLM-Pruner for task-agnostic, low resource Generative AI for Commercial and Personal Use
compared to using out-of-the-box models like [`meta-llama/Llama-3.2-3B-Instruct`](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)
[Our presentation slides may be found here](https://drive.google.com/file/d/1_uALSOYl3pe2OVDf46pFVm7LaBhEsfxe/view?usp=sharing)
# To replicate,
1. First, clone the [official implementation](https://github.com/horseee/LLM-Pruner) and run:
```
python llama3.py --pruning_ratio 0.25 \
--device cuda --eval_device cuda \
--base_model meta-llama/Meta-Llama-3-8B-Instruct \
--block_wise --block_mlp_layer_start 4 --block_mlp_layer_end 30 \
--block_attention_layer_start 4 --block_attention_layer_end 30 \
--save_ckpt_log_name llama3_prune \
--pruner_type random \
--max_seq_len 512 \
--test_after_train --test_before_train --save_model
```
to get the pruned model.
**NOTE**: We removed `'ptb'` from the datasets in `llama3.py` since it requires foreign code to load.
2. Then, to post-train, follow the official implementation, [section 2](https://github.com/horseee/LLM-Pruner?tab=readme-ov-file#2-post-training-recover-stage)
# Benchmark Results
**Benchmark Evaluation**:
The model follows the original paper's evaluation and perform zero-shot task classification on 5 common sense
reasoning datasets that doesn't require foreign code to load:
| Model | BoolQ | HellaSwag | ARC-e | ARC-c | OBQA | Average Accuracy |
|------------------------------|--------|-----------|--------|--------|-------|-------------------|
| **Llama-3-6.6B-R-Pruned** | 74.25 | 67.59 | 71.21 | 42.49 | 38.8 | 58.87 |
# Usage:
Follow the official implementation for usage,
[section `Pruned Model with Post-Training`](https://github.com/horseee/LLM-Pruner?tab=readme-ov-file#2-post-training-recover-stage). |