Not-For-All-Audiences

Merge

Eval Results

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Buttocks-7B-v1.0 / README.md

TeeZee

Adding Evaluation Results (#1)

e0eb0cc verified about 1 year ago

preview code

raw

history blame contribute delete

4.63 kB

	---
	license: cc-by-nc-4.0
	tags:
	- not-for-all-audiences
	- merge
	model-index:
	- name: Buttocks-7B-v1.0
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: AI2 Reasoning Challenge (25-Shot)
	type: ai2_arc
	config: ARC-Challenge
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: acc_norm
	value: 54.61
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TeeZee/Buttocks-7B-v1.0
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HellaSwag (10-Shot)
	type: hellaswag
	split: validation
	args:
	num_few_shot: 10
	metrics:
	- type: acc_norm
	value: 75.61
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TeeZee/Buttocks-7B-v1.0
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU (5-Shot)
	type: cais/mmlu
	config: all
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 50.22
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TeeZee/Buttocks-7B-v1.0
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: TruthfulQA (0-shot)
	type: truthful_qa
	config: multiple_choice
	split: validation
	args:
	num_few_shot: 0
	metrics:
	- type: mc2
	value: 44.72
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TeeZee/Buttocks-7B-v1.0
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Winogrande (5-shot)
	type: winogrande
	config: winogrande_xl
	split: validation
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 68.9
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TeeZee/Buttocks-7B-v1.0
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GSM8k (5-shot)
	type: gsm8k
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 5.76
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TeeZee/Buttocks-7B-v1.0
	name: Open LLM Leaderboard
	---

	### Buttocks 7B v1.0 ###

	An experiment that has gone very, very wrong.

	### Model details ###

	- Recreation of the original recipe for [Undi95/Toppy-M-7B](https://huggingface.co/Undi95/Toppy-M-7B), but instead of final merge done by mergekit, [MergeMoster](https://github.com/Gryphe/MergeMonster/) was used with extended RPG preset.
	- recipe in [mergekit-config](https://huggingface.co/TeeZee/Toppy-7B-remake-mergemonster-SLERP-v1.0/resolve/main/toppy-slerp-merge-config.yml), stepsAA, BB, CC are the original models with LORAS as per Toppy M 7B sauce.
	- SLERP merge method was used

	### Results ###

	- in simple terms this model is totally unhinged
	- it always produces sequences similar to fever dreams or drug trips
	- on a good day it can produce scenarios similar to old Monty Python sketches
	- models shows incredible affinity to words like 'ass', 'buttocks', 'farts', prompting with those single words will probably
	produce a whole story revolving around those topics.

	### Possible uses ###

	- to generate dream sequence in a story
	- to make the boring model more unpredictable by merging at low weights with this monster
	- to take a break, connect Silly Tavern to this model and get a few ROTFLs observing how every story deteriorates into pure craziness
	- research on LLM hallucinations
	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_TeeZee__Buttocks-7B-v1.0)

	\| Metric \|Value\|
	\|---------------------------------\|----:\|
	\|Avg. \|49.97\|
	\|AI2 Reasoning Challenge (25-Shot)\|54.61\|
	\|HellaSwag (10-Shot) \|75.61\|
	\|MMLU (5-Shot) \|50.22\|
	\|TruthfulQA (0-shot) \|44.72\|
	\|Winogrande (5-shot) \|68.90\|
	\|GSM8k (5-shot) \| 5.76\|