|
--- |
|
base_model: |
|
- meta-llama/Meta-Llama-3-8B-Instruct |
|
pipeline_tag: text-generation |
|
metrics: |
|
- accuracy |
|
--- |
|
|
|
# Model Description: |
|
Pruned from [`meta-llama/Meta-Llama-3-8B-Instruct`](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) |
|
using the Random Pruner from [`LLM-Pruner: On the Structural Pruning of Large Language Models`](https://arxiv.org/abs/2305.11627) |
|
|
|
Done to test viability of LLM-Pruner for task-agnostic, low resource Generative AI for Commercial and Personal Use |
|
compared to using out-of-the-box models like [`meta-llama/Llama-3.2-3B-Instruct`](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) |
|
|
|
[Our presentation slides may be found here](https://drive.google.com/file/d/1_uALSOYl3pe2OVDf46pFVm7LaBhEsfxe/view?usp=sharing) |
|
|
|
|
|
# To replicate, |
|
|
|
1. First, clone the [official implementation](https://github.com/horseee/LLM-Pruner) and run: |
|
``` |
|
python llama3.py --pruning_ratio 0.25 \ |
|
--device cuda --eval_device cuda \ |
|
--base_model meta-llama/Meta-Llama-3-8B-Instruct \ |
|
--block_wise --block_mlp_layer_start 4 --block_mlp_layer_end 30 \ |
|
--block_attention_layer_start 4 --block_attention_layer_end 30 \ |
|
--save_ckpt_log_name llama3_prune \ |
|
--pruner_type random \ |
|
--max_seq_len 512 \ |
|
--test_after_train --test_before_train --save_model |
|
``` |
|
to get the pruned model. |
|
|
|
**NOTE**: We removed `'ptb'` from the datasets in `llama3.py` since it requires foreign code to load. |
|
|
|
|
|
2. Then, to post-train, follow the official implementation, [section 2](https://github.com/horseee/LLM-Pruner?tab=readme-ov-file#2-post-training-recover-stage) |
|
|
|
|
|
# Benchmark Results |
|
|
|
**Benchmark Evaluation**: |
|
The model follows the original paper's evaluation and perform zero-shot task classification on 5 common sense |
|
reasoning datasets that doesn't require foreign code to load: |
|
|
|
| Model | BoolQ | HellaSwag | ARC-e | ARC-c | OBQA | Average Accuracy | |
|
|------------------------------|--------|-----------|--------|--------|-------|-------------------| |
|
| **Llama-3-6.6B-R-Pruned** | 74.25 | 67.59 | 71.21 | 42.49 | 38.8 | 58.87 | |
|
|
|
|
|
# Usage: |
|
|
|
Follow the official implementation for usage, |
|
[section `Pruned Model with Post-Training`](https://github.com/horseee/LLM-Pruner?tab=readme-ov-file#2-post-training-recover-stage). |