Mihaiii
/

Covasna-0.1

Text Generation

text-generation-inference

Model card Files Files and versions Community

Covasna-0.1 / README.md

Mihaiii's picture

Update README.md

4f14f05 verified about 1 year ago

|

history blame contribute delete

2.05 kB

	---
	base_model: migtissera/Tess-70B-v1.6
	inference: false
	license: llama2
	metrics:
	- accuracy
	---

	This is a BF16 and pruned version of [migtissera/Tess-70B-v1.6](https://huggingface.co/migtissera/Tess-70B-v1.6) .

	[migtissera/Tess-70B-v1.6](https://huggingface.co/migtissera/Tess-70B-v1.6) has 69 billion params and Covasna-0.1 has 41.6 billion (~60.3% param size)

	# Steps to replicate:

	Use [laserQlora.ipynb](https://github.com/cognitivecomputations/laserRMT/blob/main/laserQlora.ipynb) from [cognitivecomputations/laserRMT](https://github.com/cognitivecomputations/laserRMT) to determine which layers should be eliminated.

	Adapt the script for `migtissera/Tess-70B-v1.6` by replacing `model_name = "mistralai/Mistral-7B-v0.1"` with `model_name = "migtissera/Tess-70B-v1.6"` and `layer_numbers = list(range(31, -1, -1))` with `layer_numbers = list(range(79, -1, -1))`, [79 being the last recurrent layer index Tess-70B-v1.6 has](https://huggingface.co/migtissera/Tess-70B-v1.6?show_tensors=true).

	Then look for the layer indexes where self_attn.v_proj snr is Infinity and eliminate those layers using [mergekit](https://github.com/arcee-ai/mergekit).

	Here is the mergekit config:

	```yml
	slices:
	- sources:
	- model: "migtissera/Tess-70B-v1.6"
	layer_range: [0, 7]
	- sources:
	- model: "migtissera/Tess-70B-v1.6"
	layer_range: [8, 9]
	- sources:
	- model: "migtissera/Tess-70B-v1.6"
	layer_range: [12, 29]
	- sources:
	- model: "migtissera/Tess-70B-v1.6"
	layer_range: [31, 32]
	- sources:
	- model: "migtissera/Tess-70B-v1.6"
	layer_range: [33, 45]
	- sources:
	- model: "migtissera/Tess-70B-v1.6"
	layer_range: [50, 52]
	- sources:
	- model: "migtissera/Tess-70B-v1.6"
	layer_range: [60, 61]
	- sources:
	- model: "migtissera/Tess-70B-v1.6"
	layer_range: [67, 68]
	- sources:
	- model: "migtissera/Tess-70B-v1.6"
	layer_range: [74, 80]
	merge_method: passthrough
	dtype: bfloat16
	```

	GGUF:
	[Covasna-0.1-GGUF](https://huggingface.co/mradermacher/Covasna-0.1-GGUF)