ChuckMcSneed
/

WinterGoddess-1.4x-70b-32k

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

This is a 32k version of Sao10K/WinterGoddess-1.4x-70B-L2, extended using method discussed here.

Quants

Thanks for GGUF, @Nexesenex!

GGUF

Benchmarks

NeoEvalPlusN_benchmark

My meme benchmark.

Test name	WinterGoddess	WinterGoddess-32k
B	2	2.5
C	1.5	2
D	3	0
S	2.75	1.5
P	5.5	2.25
Total	14.75	8.25

Open LLM Leaderboard Evaluation Results

Leaderboard on Huggingface

Model	Average	ARC	HellaSwag	MMLU	TruthfulQA	Winogrande	GSM8K
Sao10K/WinterGoddess-1.4x-70B-L2	73.23	72.78	90.11	71.12	65.76	85	54.59
ChuckMcSneed/WinterGoddess-1.4x-70b-32k	69.4	71.16	89.12	66.42	63.87	82.56	43.29
Difference	3.83	1.62	0.99	4.7	1.89	2.44	11.3

Here the losses seem far less brutal than on my bench. It seems that extending with LongLORA kills MMLU and GSM8K performance.

Detailed results can be found here

Metric	Value
Avg.	69.40
AI2 Reasoning Challenge (25-Shot)	71.16
HellaSwag (10-Shot)	89.12
MMLU (5-Shot)	66.42
TruthfulQA (0-shot)	63.87
Winogrande (5-shot)	82.56
GSM8k (5-shot)	43.29

Downloads last month: 53

Safetensors

Model size

69B params

Tensor type

FP16

·

Inference Providers NEW

Text Generation

This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for ChuckMcSneed/WinterGoddess-1.4x-70b-32k

Quantizations

Evaluation results

normalized accuracy on AI2 Reasoning Challenge (25-Shot)
test set Open LLM Leaderboard

71.160
normalized accuracy on HellaSwag (10-Shot)
validation set Open LLM Leaderboard

89.120
accuracy on MMLU (5-Shot)
test set Open LLM Leaderboard

66.420
mc2 on TruthfulQA (0-shot)
validation set Open LLM Leaderboard

63.870
accuracy on Winogrande (5-shot)
validation set Open LLM Leaderboard

82.560
accuracy on GSM8k (5-shot)
test set Open LLM Leaderboard

43.290

View on Papers With Code