npc0
/

llama3.1-41B-raw

Text Generation

text-generation-inference

Model card Files Files and versions Community

llama3.1-41B-raw / README.md

npc0's picture

Update README.md

0926a8f verified 17 days ago

|

3.82 kB

	---
	base_model: meta-llama/Meta-Llama-3.1-70B-Instruct
	library_name: transformers
	license: llama3.1
	pipeline_tag: text-generation
	tags:
	- facebook
	- meta
	- pytorch
	- pruning
	- llama
	- llama-3
	---

	## Model Information
	The Llama 3.1 text only 41B model is pruned from Llama 3.1 instruction finetuned text only 70B
	using [FLAP method](arxiv.org/abs/2312.11983).

	> TL;DR No under maintenance. Bad performance, no value. Side product of experiment.

	Hyper parameters used for pruning:
	```
	metrics: WIFV
	structure: AL-AM
	pruning_ratio: 0.5
	```

	## Limitation
	This `llama3.1-41B-raw` model gives unstable output.
	A finetune on instruction dataset is recommended.

	The model is not supported by any library at the moment
	due to its unconsistent shape between layers after pruning.

	## Usage
	The model is not supported by any library at the moment,
	following is a workaround.
	```python
	from functools import reduce
	def get_module_by_name(module, access_string):
	names = access_string.split(sep='.')
	return reduce(getattr, names, module)

	import json
	from safetensors import safe_open
	from transformers import LlamaForCausalLM
	class MyLlamaForCausalLM(LlamaForCausalLM):
	def __init__(self, config):
	super().__init__(config)
	with open(os.path.join(
	config._name_or_path,
	"model.safetensors.index.json")) as f:
	weight_map = json.load(f)
	weight_map = weight_map["weight_map"]
	for name, path in weight_map.items():
	module_name = name.replace('.weight', '')
	if '.bias' in module_name:
	continue
	layer = get_module_by_name(self, module_name)
	with safe_open(
	os.path.join(
	config._name_or_path,
	path), framework="pt") as f:
	tensor = f.get_tensor(name)
	if 'mlp.' in name or 'attn.' in name:
	if tensor.shape != (layer.out_features, layer.in_features):
	layer = layer.__init__(
	tensor.shape[1],
	tensor.shape[0],
	bias=layer.bias,
	dtype=layer.weight.dtype,
	device=layer.weight.device)
	for name, path in weight_map.items():
	if 'attn.' in name:
	module = get_module_by_name(
	self,
	'.'.join(name.split('.')[:-2]))
	module.num_heads = module.q_proj.out_features // module.head_dim
	module.num_key_value_heads = module.num_heads
	module.num_key_value_groups = module.num_heads // module.num_key_value_heads


	model = MyLlamaForCausalLM.from_pretrained(
	"npc0/llama3.1-41B-raw",
	torch_dtype=torch.float16,
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained(
	"FLAP/llm_weights/flap_p0.5_WIFV_ALAM_llama_70b")
	model = model.eval()

	messages = [
	{"role": "system", "content": "You are a helpful AI assistant."},
	{"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
	{"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."},
	{"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
	]

	model_inputs = tokenizer.apply_chat_template(messages,
	return_tensors="pt").to(model.device)
	generated_ids = model.generate(model_inputs, max_new_tokens=128)
	decoded = tokenizer.batch_decode(generated_ids)
	print(decoded[0])
	```