Tülu 3.1 8B GGUF

Original model: Tülu 3.1 8B

Model creator: allenai

Version 3.1 update: The new version of our Tülu model is from an improvement only in the final RL stage of training. We switched from PPO to GRPO (no reward model) and did further hyperparameter tuning to achieve substantial performance improvements across the board over the original Tülu 3 8B model, as shown in the comparison below:

Tülu 3 is a leading instruction following model family, offering a post-training package with fully open-source data, code, and recipes designed to serve as a comprehensive guide for modern techniques. This is one step of a bigger process to training fully open-source models, like our OLMo models. Tülu 3 is designed for state-of-the-art performance on a diversity of tasks in addition to chat, such as MATH, GSM8K, and IFEval.

This repo contains GGUF format model files for the Allen Institute for AI’s Tülu 3.1 8B.

What is GGUF?

GGUF is a file format for representing AI models. It is the third version of the format, introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Converted with llama.cpp build 4699 (revision 31afcbe), using autogguf-rs.

Prompt template

<|system|>
{{system_message}}
<|user|>
{{prompt}}
<|assistant|>
{{assistant_message}}<|end_of_text|>

Download & run with cnvrs on iPhone, iPad, and Mac!

cnvrs is the best app for private, local AI on your device:

create & save Characters with custom system prompts & temperature settings
download and experiment with any GGUF model you can find on HuggingFace!
- or, use an API key with the chat completions-compatible model provider of your choice -- ChatGPT, Claude, Gemini, DeepSeek, & more!
make it your own with custom Theme colors
powered by Metal ⚡️ & Llama.cpp, with haptics during response streaming!
try it out yourself today, on Testflight!
follow cnvrs on twitter to stay up to date

Original Model Evaluation

Benchmark (eval)	Tülu 3 SFT 8B	Tülu 3 DPO 8B	Tülu 3 8B	Tülu 3.1 8B (NEW)	Llama 3.1 8B Instruct	Qwen 2.5 7B Instruct	Magpie 8B	Gemma 2 9B Instruct	Ministral 8B Instruct
Avg.	60.4	64.4	64.8	66.3	62.2	66.5	44.7	55.2	58.3
MMLU (0 shot, CoT)	65.9	68.7	68.2	69.5	71.2	76.6	62.0	74.6	68.5
PopQA (15 shot)	29.3	29.3	29.1	30.2	20.2	18.1	22.5	28.3	20.2
TruthfulQA (6 shot)	46.8	56.1	55.0	59.9	55.1	63.1	57.0	61.4	55.5
BigBenchHard (3 shot, CoT)	67.9	65.8	66.0	68.9	62.8	70.2	0.9	2.5	56.2
DROP (3 shot)	61.3	62.5	62.6	63.9	61.5	54.4	49.4	58.8	56.2
MATH (4 shot CoT, Flex)	31.5	42.0	43.7	47.8	42.5	69.9	5.1	29.8	40.0
GSM8K (8 shot, CoT)	76.2	84.3	87.6	90.0	83.4	83.8	61.2	79.7	80.0
HumanEval (pass@10)	86.2	83.9	83.9	84.8	86.3	93.1	75.4	71.7	91.0
HumanEval+ (pass@10)	81.4	78.6	79.2	80.4	82.9	89.7	69.1	67.0	88.5
IFEval (prompt loose)	72.8	81.1	82.4	83.9	80.6	74.7	38.8	69.9	56.4
AlpacaEval 2 (LC % win)	12.4	33.5	34.5	34.9	24.2	29.0	49.0	43.7	31.4
Safety (6 task avg.)	93.1	87.2	85.5	81.2	75.2	75.0	46.4	75.5	56.2

brittlewis12
/

Llama-3.1-Tulu-3.1-8B-GGUF

Tülu 3.1 8B GGUF

What is GGUF?

Prompt template

Download & run with cnvrs on iPhone, iPad, and Mac!

Original Model Evaluation

Model tree for brittlewis12/Llama-3.1-Tulu-3.1-8B-GGUF

Dataset used to train brittlewis12/Llama-3.1-Tulu-3.1-8B-GGUF