moiduy04
/

Llama-3-6.6B-R-Pruned

Text Generation

Model card Files Files and versions Community

Llama-3-6.6B-R-Pruned / README.md

moiduy04's picture

Update README.md

b38ac29 verified 22 days ago

|

history blame contribute delete

2.41 kB

	---
	base_model:
	- meta-llama/Meta-Llama-3-8B-Instruct
	pipeline_tag: text-generation
	metrics:
	- accuracy
	---

	# Model Description:
	Pruned from [`meta-llama/Meta-Llama-3-8B-Instruct`](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)
	using the Random Pruner from [`LLM-Pruner: On the Structural Pruning of Large Language Models`](https://arxiv.org/abs/2305.11627)

	Done to test viability of LLM-Pruner for task-agnostic, low resource Generative AI for Commercial and Personal Use
	compared to using out-of-the-box models like [`meta-llama/Llama-3.2-3B-Instruct`](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)

	[Our presentation slides may be found here](https://drive.google.com/file/d/1_uALSOYl3pe2OVDf46pFVm7LaBhEsfxe/view?usp=sharing)


	# To replicate,

	1. First, clone the [official implementation](https://github.com/horseee/LLM-Pruner) and run:
	```
	python llama3.py --pruning_ratio 0.25 \
	--device cuda --eval_device cuda \
	--base_model meta-llama/Meta-Llama-3-8B-Instruct \
	--block_wise --block_mlp_layer_start 4 --block_mlp_layer_end 30 \
	--block_attention_layer_start 4 --block_attention_layer_end 30 \
	--save_ckpt_log_name llama3_prune \
	--pruner_type random \
	--max_seq_len 512 \
	--test_after_train --test_before_train --save_model
	```
	to get the pruned model.

	NOTE: We removed `'ptb'` from the datasets in `llama3.py` since it requires foreign code to load.


	2. Then, to post-train, follow the official implementation, [section 2](https://github.com/horseee/LLM-Pruner?tab=readme-ov-file#2-post-training-recover-stage)


	# Benchmark Results

	Benchmark Evaluation:
	The model follows the original paper's evaluation and perform zero-shot task classification on 5 common sense
	reasoning datasets that doesn't require foreign code to load:

	\| Model \| BoolQ \| HellaSwag \| ARC-e \| ARC-c \| OBQA \| Average Accuracy \|
	\|------------------------------\|--------\|-----------\|--------\|--------\|-------\|-------------------\|
	\| Llama-3-6.6B-R-Pruned \| 74.25 \| 67.59 \| 71.21 \| 42.49 \| 38.8 \| 58.87 \|


	# Usage:

	Follow the official implementation for usage,
	[section `Pruned Model with Post-Training`](https://github.com/horseee/LLM-Pruner?tab=readme-ov-file#2-post-training-recover-stage).