File size: 4,941 Bytes
f381879 ccb5e13 a53a094 f381879 0429951 6bd26d0 aa2b1b7 9edc20f 1a68e27 aa2b1b7 a8ba114 64e4d96 aeddfcc f381879 a53a094 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 |
---
license: gpl-3.0
model-index:
- name: 34b-beta
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: IFEval (0-Shot)
type: HuggingFaceH4/ifeval
args:
num_few_shot: 0
metrics:
- type: inst_level_strict_acc and prompt_level_strict_acc
value: 30.43
name: strict accuracy
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=CausalLM/34b-beta
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: BBH (3-Shot)
type: BBH
args:
num_few_shot: 3
metrics:
- type: acc_norm
value: 36.68
name: normalized accuracy
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=CausalLM/34b-beta
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MATH Lvl 5 (4-Shot)
type: hendrycks/competition_math
args:
num_few_shot: 4
metrics:
- type: exact_match
value: 4.15
name: exact match
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=CausalLM/34b-beta
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GPQA (0-shot)
type: Idavidrein/gpqa
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 12.86
name: acc_norm
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=CausalLM/34b-beta
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MuSR (0-shot)
type: TAUR-Lab/MuSR
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 6.92
name: acc_norm
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=CausalLM/34b-beta
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU-PRO (5-shot)
type: TIGER-Lab/MMLU-Pro
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 48.06
name: accuracy
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=CausalLM/34b-beta
name: Open LLM Leaderboard
---
# CausalLM 34B β
Demo: [![](https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg)](https://huggingface.co/spaces/JosephusCheung/CausalLM-34B-8-bit-GGUF)
## PROMPT FORMAT:
[chatml](https://github.com/openai/openai-python/blob/main/chatml.md)
There are some issues with the model weights in terms of precision. In the next version update, we will roll back some progress and retrain to fix these issues as soon as possible.
**Please note:** Do not use "accelerated inference frameworks" like **VLLM** temporarily. Instead, use Transformers for inference. Otherwise, due to precision issues, the output quality will be significantly degraded. If you need faster inference, you can consider using the q8_0 quantization (faster and better than bf16 vllm for this model only) with llama.cpp temporarily or wait for the official version.
To be fixed in the upcoming next version update.
**no repetition_penalty!**
Please do not use wikitext for quantization calibration because all wikitext have been re-aligned on synthetic dataset, and its distribution differs significantly from the original wikitext.
## MT-Bench: 8.5
![mt-bench](https://cdn-uploads.huggingface.co/production/uploads/63468a143ea42ee2cb49ddd1/2vv2_nGbfWuOM8jwy40dn.png)
## Some contamination detection if you want to check:
| Models | MMLU (ref: llama7b) | TBA |
| ------------------------- | ------------------- | ---- |
| microsoft/Orca-2-7b | 0.77 | |
| mistralai/Mistral-7B-v0.1 | 0.46 | |
| **CausalLM/34b-beta** | **0.38** | |
| 01-ai/Yi-6B-200K | 0.3 | |
data from https://huggingface.co/spaces/Yeyito/llm_contamination_detector
It should be *safe*. It was not trained on the benchmark, but the contamination of the training dataset is unavoidable due to cost constraints.
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_CausalLM__34b-beta)
| Metric |Value|
|-------------------|----:|
|Avg. |23.18|
|IFEval (0-Shot) |30.43|
|BBH (3-Shot) |36.68|
|MATH Lvl 5 (4-Shot)| 4.15|
|GPQA (0-shot) |12.86|
|MuSR (0-shot) | 6.92|
|MMLU-PRO (5-shot) |48.06|
|