Model Information
The Llama 3.1 text only 41B model is pruned from Llama 3.1 instruction finetuned text only 70B using FLAP method.
TL;DR No under maintenance. Bad performance, no value. Side product of experiment.
Hyper parameters used for pruning:
metrics: WIFV
structure: AL-AM
pruning_ratio: 0.5
Limitation
This llama3.1-41B-raw
model gives unstable output.
A finetune on instruction dataset is recommended.
The model is not supported by any library at the moment due to its unconsistent shape between layers after pruning.
Usage
The model is not supported by any library at the moment, following is a workaround.
from functools import reduce
def get_module_by_name(module, access_string):
names = access_string.split(sep='.')
return reduce(getattr, names, module)
import json
from safetensors import safe_open
from transformers import LlamaForCausalLM
class MyLlamaForCausalLM(LlamaForCausalLM):
def __init__(self, config):
super().__init__(config)
with open(os.path.join(
config._name_or_path,
"model.safetensors.index.json")) as f:
weight_map = json.load(f)
weight_map = weight_map["weight_map"]
for name, path in weight_map.items():
module_name = name.replace('.weight', '')
if '.bias' in module_name:
continue
layer = get_module_by_name(self, module_name)
with safe_open(
os.path.join(
config._name_or_path,
path), framework="pt") as f:
tensor = f.get_tensor(name)
if 'mlp.' in name or 'attn.' in name:
if tensor.shape != (layer.out_features, layer.in_features):
layer = layer.__init__(
tensor.shape[1],
tensor.shape[0],
bias=layer.bias,
dtype=layer.weight.dtype,
device=layer.weight.device)
for name, path in weight_map.items():
if 'attn.' in name:
module = get_module_by_name(
self,
'.'.join(name.split('.')[:-2]))
module.num_heads = module.q_proj.out_features // module.head_dim
module.num_key_value_heads = module.num_heads
module.num_key_value_groups = module.num_heads // module.num_key_value_heads
model = MyLlamaForCausalLM.from_pretrained(
"npc0/llama3.1-41B-raw",
torch_dtype=torch.float16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(
"FLAP/llm_weights/flap_p0.5_WIFV_ALAM_llama_70b")
model = model.eval()
messages = [
{"role": "system", "content": "You are a helpful AI assistant."},
{"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
{"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."},
{"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]
model_inputs = tokenizer.apply_chat_template(messages,
return_tensors="pt").to(model.device)
generated_ids = model.generate(model_inputs, max_new_tokens=128)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])
- Downloads last month
- 5
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Model tree for npc0/llama3.1-41B-raw
Base model
meta-llama/Llama-3.1-70B
Finetuned
meta-llama/Llama-3.1-70B-Instruct