README.md · npc0/llama3.1-41B-raw at main

metadata

base_model: meta-llama/Meta-Llama-3.1-70B-Instruct
library_name: transformers
license: llama3.1
pipeline_tag: text-generation
tags:
  - facebook
  - meta
  - pytorch
  - pruning
  - llama
  - llama-3

Model Information

The Llama 3.1 text only 41B model is pruned from Llama 3.1 instruction finetuned text only 70B using FLAP method.

TL;DR No under maintenance. Bad performance, no value. Side product of experiment.

Hyper parameters used for pruning:

metrics: WIFV
structure: AL-AM
pruning_ratio: 0.5

Limitation

This llama3.1-41B-raw model gives unstable output. A finetune on instruction dataset is recommended.

The model is not supported by any library at the moment due to its unconsistent shape between layers after pruning.

Usage

The model is not supported by any library at the moment, following is a workaround.

from functools import reduce
def get_module_by_name(module, access_string):
    names = access_string.split(sep='.')
    return reduce(getattr, names, module)

import json
from safetensors import safe_open
from transformers import LlamaForCausalLM
class MyLlamaForCausalLM(LlamaForCausalLM):
    def __init__(self, config):
        super().__init__(config)
        with open(os.path.join(
                config._name_or_path,
                "model.safetensors.index.json")) as f:
            weight_map = json.load(f)
            weight_map = weight_map["weight_map"]
        for name, path in weight_map.items():
            module_name = name.replace('.weight', '')
            if '.bias' in module_name:
                continue
            layer = get_module_by_name(self, module_name)
            with safe_open(
                os.path.join(
                    config._name_or_path,
                    path), framework="pt") as f:
                tensor = f.get_tensor(name)
            if 'mlp.' in name or 'attn.' in name:
                if tensor.shape != (layer.out_features, layer.in_features):
                    layer = layer.__init__(
                        tensor.shape[1],
                        tensor.shape[0],
                        bias=layer.bias,
                        dtype=layer.weight.dtype,
                        device=layer.weight.device)
        for name, path in weight_map.items():
            if 'attn.' in name:
                module = get_module_by_name(
                    self,
                    '.'.join(name.split('.')[:-2]))
                module.num_heads = module.q_proj.out_features // module.head_dim
                module.num_key_value_heads = module.num_heads
                module.num_key_value_groups = module.num_heads // module.num_key_value_heads


model = MyLlamaForCausalLM.from_pretrained(
    "npc0/llama3.1-41B-raw",
    torch_dtype=torch.float16, 
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(
    "FLAP/llm_weights/flap_p0.5_WIFV_ALAM_llama_70b") 
model = model.eval()

messages = [ 
    {"role": "system", "content": "You are a helpful AI assistant."}, 
    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"}, 
    {"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."}, 
    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"}, 
] 

model_inputs = tokenizer.apply_chat_template(messages,
                                             return_tensors="pt").to(model.device)
generated_ids = model.generate(model_inputs, max_new_tokens=128)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])