File size: 3,817 Bytes

---
base_model: meta-llama/Meta-Llama-3.1-70B-Instruct
library_name: transformers
license: llama3.1
pipeline_tag: text-generation
tags:
- facebook
- meta
- pytorch
- pruning
- llama
- llama-3
---

## Model Information
The Llama 3.1 text only 41B model is pruned from Llama 3.1 instruction finetuned text only 70B
using [FLAP method](arxiv.org/abs/2312.11983).

> TL;DR No under maintenance. Bad performance, no value. Side product of experiment.

Hyper parameters used for pruning:
```
metrics: WIFV
structure: AL-AM
pruning_ratio: 0.5
```

## Limitation
This `llama3.1-41B-raw` model gives unstable output.
A finetune on instruction dataset is recommended.

The model is not supported by any library at the moment
due to its unconsistent shape between layers after pruning.

## Usage
The model is not supported by any library at the moment,
following is a workaround.
```python
from functools import reduce
def get_module_by_name(module, access_string):
    names = access_string.split(sep='.')
    return reduce(getattr, names, module)

import json
from safetensors import safe_open
from transformers import LlamaForCausalLM
class MyLlamaForCausalLM(LlamaForCausalLM):
    def __init__(self, config):
        super().__init__(config)
        with open(os.path.join(
                config._name_or_path,
                "model.safetensors.index.json")) as f:
            weight_map = json.load(f)
            weight_map = weight_map["weight_map"]
        for name, path in weight_map.items():
            module_name = name.replace('.weight', '')
            if '.bias' in module_name:
                continue
            layer = get_module_by_name(self, module_name)
            with safe_open(
                os.path.join(
                    config._name_or_path,
                    path), framework="pt") as f:
                tensor = f.get_tensor(name)
            if 'mlp.' in name or 'attn.' in name:
                if tensor.shape != (layer.out_features, layer.in_features):
                    layer = layer.__init__(
                        tensor.shape[1],
                        tensor.shape[0],
                        bias=layer.bias,
                        dtype=layer.weight.dtype,
                        device=layer.weight.device)
        for name, path in weight_map.items():
            if 'attn.' in name:
                module = get_module_by_name(
                    self,
                    '.'.join(name.split('.')[:-2]))
                module.num_heads = module.q_proj.out_features // module.head_dim
                module.num_key_value_heads = module.num_heads
                module.num_key_value_groups = module.num_heads // module.num_key_value_heads


model = MyLlamaForCausalLM.from_pretrained(
    "npc0/llama3.1-41B-raw",
    torch_dtype=torch.float16, 
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(
    "FLAP/llm_weights/flap_p0.5_WIFV_ALAM_llama_70b") 
model = model.eval()

messages = [ 
    {"role": "system", "content": "You are a helpful AI assistant."}, 
    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"}, 
    {"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."}, 
    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"}, 
] 

model_inputs = tokenizer.apply_chat_template(messages,
                                             return_tensors="pt").to(model.device)
generated_ids = model.generate(model_inputs, max_new_tokens=128)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])
```