Adding Evaluation Results
#1
by
T145
- opened
README.md
CHANGED
@@ -20,6 +20,105 @@ tags:
|
|
20 |
- companion
|
21 |
- friend
|
22 |
base_model: meta-llama/Llama-3.1-8B-Instruct
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
---
|
24 |
|
25 |
# Dobby-Mini-Unhinged-Llama-3.1-8B
|
@@ -227,4 +326,18 @@ print(outputs[0]['generated_text'])
|
|
227 |
|
228 |
This model is derived from Llama 3.1 8B and is governed by the Llama 3.1 Community License Agreement. By using these weights, you agree to the terms set by Meta for Llama 3.1.
|
229 |
|
230 |
-
It is important to note that, as with all LLMs, factual inaccuracies may occur. Any investment or legal opinions expressed should be independently verified. Knowledge cutoff is the same as LLama-3.1-8B. That is, December 2023.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
- companion
|
21 |
- friend
|
22 |
base_model: meta-llama/Llama-3.1-8B-Instruct
|
23 |
+
model-index:
|
24 |
+
- name: Dobby-Mini-Unhinged-Llama-3.1-8B
|
25 |
+
results:
|
26 |
+
- task:
|
27 |
+
type: text-generation
|
28 |
+
name: Text Generation
|
29 |
+
dataset:
|
30 |
+
name: IFEval (0-Shot)
|
31 |
+
type: wis-k/instruction-following-eval
|
32 |
+
split: train
|
33 |
+
args:
|
34 |
+
num_few_shot: 0
|
35 |
+
metrics:
|
36 |
+
- type: inst_level_strict_acc and prompt_level_strict_acc
|
37 |
+
value: 74.57
|
38 |
+
name: averaged accuracy
|
39 |
+
source:
|
40 |
+
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=SentientAGI%2FDobby-Mini-Unhinged-Llama-3.1-8B
|
41 |
+
name: Open LLM Leaderboard
|
42 |
+
- task:
|
43 |
+
type: text-generation
|
44 |
+
name: Text Generation
|
45 |
+
dataset:
|
46 |
+
name: BBH (3-Shot)
|
47 |
+
type: SaylorTwift/bbh
|
48 |
+
split: test
|
49 |
+
args:
|
50 |
+
num_few_shot: 3
|
51 |
+
metrics:
|
52 |
+
- type: acc_norm
|
53 |
+
value: 30.37
|
54 |
+
name: normalized accuracy
|
55 |
+
source:
|
56 |
+
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=SentientAGI%2FDobby-Mini-Unhinged-Llama-3.1-8B
|
57 |
+
name: Open LLM Leaderboard
|
58 |
+
- task:
|
59 |
+
type: text-generation
|
60 |
+
name: Text Generation
|
61 |
+
dataset:
|
62 |
+
name: MATH Lvl 5 (4-Shot)
|
63 |
+
type: lighteval/MATH-Hard
|
64 |
+
split: test
|
65 |
+
args:
|
66 |
+
num_few_shot: 4
|
67 |
+
metrics:
|
68 |
+
- type: exact_match
|
69 |
+
value: 11.25
|
70 |
+
name: exact match
|
71 |
+
source:
|
72 |
+
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=SentientAGI%2FDobby-Mini-Unhinged-Llama-3.1-8B
|
73 |
+
name: Open LLM Leaderboard
|
74 |
+
- task:
|
75 |
+
type: text-generation
|
76 |
+
name: Text Generation
|
77 |
+
dataset:
|
78 |
+
name: GPQA (0-shot)
|
79 |
+
type: Idavidrein/gpqa
|
80 |
+
split: train
|
81 |
+
args:
|
82 |
+
num_few_shot: 0
|
83 |
+
metrics:
|
84 |
+
- type: acc_norm
|
85 |
+
value: 7.49
|
86 |
+
name: acc_norm
|
87 |
+
source:
|
88 |
+
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=SentientAGI%2FDobby-Mini-Unhinged-Llama-3.1-8B
|
89 |
+
name: Open LLM Leaderboard
|
90 |
+
- task:
|
91 |
+
type: text-generation
|
92 |
+
name: Text Generation
|
93 |
+
dataset:
|
94 |
+
name: MuSR (0-shot)
|
95 |
+
type: TAUR-Lab/MuSR
|
96 |
+
args:
|
97 |
+
num_few_shot: 0
|
98 |
+
metrics:
|
99 |
+
- type: acc_norm
|
100 |
+
value: 7.96
|
101 |
+
name: acc_norm
|
102 |
+
source:
|
103 |
+
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=SentientAGI%2FDobby-Mini-Unhinged-Llama-3.1-8B
|
104 |
+
name: Open LLM Leaderboard
|
105 |
+
- task:
|
106 |
+
type: text-generation
|
107 |
+
name: Text Generation
|
108 |
+
dataset:
|
109 |
+
name: MMLU-PRO (5-shot)
|
110 |
+
type: TIGER-Lab/MMLU-Pro
|
111 |
+
config: main
|
112 |
+
split: test
|
113 |
+
args:
|
114 |
+
num_few_shot: 5
|
115 |
+
metrics:
|
116 |
+
- type: acc
|
117 |
+
value: 28.72
|
118 |
+
name: accuracy
|
119 |
+
source:
|
120 |
+
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=SentientAGI%2FDobby-Mini-Unhinged-Llama-3.1-8B
|
121 |
+
name: Open LLM Leaderboard
|
122 |
---
|
123 |
|
124 |
# Dobby-Mini-Unhinged-Llama-3.1-8B
|
|
|
326 |
|
327 |
This model is derived from Llama 3.1 8B and is governed by the Llama 3.1 Community License Agreement. By using these weights, you agree to the terms set by Meta for Llama 3.1.
|
328 |
|
329 |
+
It is important to note that, as with all LLMs, factual inaccuracies may occur. Any investment or legal opinions expressed should be independently verified. Knowledge cutoff is the same as LLama-3.1-8B. That is, December 2023.
|
330 |
+
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
|
331 |
+
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/SentientAGI__Dobby-Mini-Unhinged-Llama-3.1-8B-details)!
|
332 |
+
Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=SentientAGI%2FDobby-Mini-Unhinged-Llama-3.1-8B&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!
|
333 |
+
|
334 |
+
| Metric |Value (%)|
|
335 |
+
|-------------------|--------:|
|
336 |
+
|**Average** | 26.73|
|
337 |
+
|IFEval (0-Shot) | 74.57|
|
338 |
+
|BBH (3-Shot) | 30.37|
|
339 |
+
|MATH Lvl 5 (4-Shot)| 11.25|
|
340 |
+
|GPQA (0-shot) | 7.49|
|
341 |
+
|MuSR (0-shot) | 7.96|
|
342 |
+
|MMLU-PRO (5-shot) | 28.72|
|
343 |
+
|