README.md · XiaodongChen/Llama-2-4.7B at main

metadata

license: mit
base_model:
  - meta-llama/Llama-2-7b-hf

The model is derived from Llama-2-7b-hf through pruning using LLM-Streamline (Streamlining Redundant Layers to Compress Large Language Models, ICLR 2025 Spotlight). The entire training process required only 0.06B tokens.

Below are the results of the evaluation using lm-eval:

	arc_c	arc_e	boolq	hellaswag	openbookqa	rte	winogrande	Avg
Llama-2-7B	43.3	76.4	77.7	57.2	31.4	62.8	69.1	59.7
Llama-2-4.7B	34.0	64.6	74.7	49.8	27.4	61.7	66.4	54.1