Tülu 3.1 8B GGUF
Original model: Tülu 3.1 8B
Model creator: allenai
Version 3.1 update: The new version of our Tülu model is from an improvement only in the final RL stage of training. We switched from PPO to GRPO (no reward model) and did further hyperparameter tuning to achieve substantial performance improvements across the board over the original Tülu 3 8B model, as shown in the comparison below:
Tülu 3 is a leading instruction following model family, offering a post-training package with fully open-source data, code, and recipes designed to serve as a comprehensive guide for modern techniques. This is one step of a bigger process to training fully open-source models, like our OLMo models. Tülu 3 is designed for state-of-the-art performance on a diversity of tasks in addition to chat, such as MATH, GSM8K, and IFEval.
This repo contains GGUF format model files for the Allen Institute for AI’s Tülu 3.1 8B.
What is GGUF?
GGUF is a file format for representing AI models. It is the third version of the format, introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Converted with llama.cpp build 4699 (revision 31afcbe), using autogguf-rs.
Prompt template
<|system|>
{{system_message}}
<|user|>
{{prompt}}
<|assistant|>
{{assistant_message}}<|end_of_text|>
Download & run with cnvrs on iPhone, iPad, and Mac!
cnvrs is the best app for private, local AI on your device:
- create & save Characters with custom system prompts & temperature settings
- download and experiment with any GGUF model you can find on HuggingFace!
- or, use an API key with the chat completions-compatible model provider of your choice -- ChatGPT, Claude, Gemini, DeepSeek, & more!
- make it your own with custom Theme colors
- powered by Metal ⚡️ & Llama.cpp, with haptics during response streaming!
- try it out yourself today, on Testflight!
- follow cnvrs on twitter to stay up to date
Original Model Evaluation
Benchmark (eval) | Tülu 3 SFT 8B | Tülu 3 DPO 8B | Tülu 3 8B | Tülu 3.1 8B (NEW) | Llama 3.1 8B Instruct | Qwen 2.5 7B Instruct | Magpie 8B | Gemma 2 9B Instruct | Ministral 8B Instruct |
---|---|---|---|---|---|---|---|---|---|
Avg. | 60.4 | 64.4 | 64.8 | 66.3 | 62.2 | 66.5 | 44.7 | 55.2 | 58.3 |
MMLU (0 shot, CoT) | 65.9 | 68.7 | 68.2 | 69.5 | 71.2 | 76.6 | 62.0 | 74.6 | 68.5 |
PopQA (15 shot) | 29.3 | 29.3 | 29.1 | 30.2 | 20.2 | 18.1 | 22.5 | 28.3 | 20.2 |
TruthfulQA (6 shot) | 46.8 | 56.1 | 55.0 | 59.9 | 55.1 | 63.1 | 57.0 | 61.4 | 55.5 |
BigBenchHard (3 shot, CoT) | 67.9 | 65.8 | 66.0 | 68.9 | 62.8 | 70.2 | 0.9 | 2.5 | 56.2 |
DROP (3 shot) | 61.3 | 62.5 | 62.6 | 63.9 | 61.5 | 54.4 | 49.4 | 58.8 | 56.2 |
MATH (4 shot CoT, Flex) | 31.5 | 42.0 | 43.7 | 47.8 | 42.5 | 69.9 | 5.1 | 29.8 | 40.0 |
GSM8K (8 shot, CoT) | 76.2 | 84.3 | 87.6 | 90.0 | 83.4 | 83.8 | 61.2 | 79.7 | 80.0 |
HumanEval (pass@10) | 86.2 | 83.9 | 83.9 | 84.8 | 86.3 | 93.1 | 75.4 | 71.7 | 91.0 |
HumanEval+ (pass@10) | 81.4 | 78.6 | 79.2 | 80.4 | 82.9 | 89.7 | 69.1 | 67.0 | 88.5 |
IFEval (prompt loose) | 72.8 | 81.1 | 82.4 | 83.9 | 80.6 | 74.7 | 38.8 | 69.9 | 56.4 |
AlpacaEval 2 (LC % win) | 12.4 | 33.5 | 34.5 | 34.9 | 24.2 | 29.0 | 49.0 | 43.7 | 31.4 |
Safety (6 task avg.) | 93.1 | 87.2 | 85.5 | 81.2 | 75.2 | 75.0 | 46.4 | 75.5 | 56.2 |
- Downloads last month
- 1,329
Model tree for brittlewis12/Llama-3.1-Tulu-3.1-8B-GGUF
Base model
meta-llama/Llama-3.1-8B