Adding Evaluation Results

#1
by T145 - opened
Files changed (1) hide show
  1. README.md +114 -1
README.md CHANGED
@@ -20,6 +20,105 @@ tags:
20
  - companion
21
  - friend
22
  base_model: meta-llama/Llama-3.1-8B-Instruct
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
  ---
24
 
25
  # Dobby-Mini-Unhinged-Llama-3.1-8B
@@ -227,4 +326,18 @@ print(outputs[0]['generated_text'])
227
 
228
  This model is derived from Llama 3.1 8B and is governed by the Llama 3.1 Community License Agreement. By using these weights, you agree to the terms set by Meta for Llama 3.1.
229
 
230
- It is important to note that, as with all LLMs, factual inaccuracies may occur. Any investment or legal opinions expressed should be independently verified. Knowledge cutoff is the same as LLama-3.1-8B. That is, December 2023.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  - companion
21
  - friend
22
  base_model: meta-llama/Llama-3.1-8B-Instruct
23
+ model-index:
24
+ - name: Dobby-Mini-Unhinged-Llama-3.1-8B
25
+ results:
26
+ - task:
27
+ type: text-generation
28
+ name: Text Generation
29
+ dataset:
30
+ name: IFEval (0-Shot)
31
+ type: wis-k/instruction-following-eval
32
+ split: train
33
+ args:
34
+ num_few_shot: 0
35
+ metrics:
36
+ - type: inst_level_strict_acc and prompt_level_strict_acc
37
+ value: 74.57
38
+ name: averaged accuracy
39
+ source:
40
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=SentientAGI%2FDobby-Mini-Unhinged-Llama-3.1-8B
41
+ name: Open LLM Leaderboard
42
+ - task:
43
+ type: text-generation
44
+ name: Text Generation
45
+ dataset:
46
+ name: BBH (3-Shot)
47
+ type: SaylorTwift/bbh
48
+ split: test
49
+ args:
50
+ num_few_shot: 3
51
+ metrics:
52
+ - type: acc_norm
53
+ value: 30.37
54
+ name: normalized accuracy
55
+ source:
56
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=SentientAGI%2FDobby-Mini-Unhinged-Llama-3.1-8B
57
+ name: Open LLM Leaderboard
58
+ - task:
59
+ type: text-generation
60
+ name: Text Generation
61
+ dataset:
62
+ name: MATH Lvl 5 (4-Shot)
63
+ type: lighteval/MATH-Hard
64
+ split: test
65
+ args:
66
+ num_few_shot: 4
67
+ metrics:
68
+ - type: exact_match
69
+ value: 11.25
70
+ name: exact match
71
+ source:
72
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=SentientAGI%2FDobby-Mini-Unhinged-Llama-3.1-8B
73
+ name: Open LLM Leaderboard
74
+ - task:
75
+ type: text-generation
76
+ name: Text Generation
77
+ dataset:
78
+ name: GPQA (0-shot)
79
+ type: Idavidrein/gpqa
80
+ split: train
81
+ args:
82
+ num_few_shot: 0
83
+ metrics:
84
+ - type: acc_norm
85
+ value: 7.49
86
+ name: acc_norm
87
+ source:
88
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=SentientAGI%2FDobby-Mini-Unhinged-Llama-3.1-8B
89
+ name: Open LLM Leaderboard
90
+ - task:
91
+ type: text-generation
92
+ name: Text Generation
93
+ dataset:
94
+ name: MuSR (0-shot)
95
+ type: TAUR-Lab/MuSR
96
+ args:
97
+ num_few_shot: 0
98
+ metrics:
99
+ - type: acc_norm
100
+ value: 7.96
101
+ name: acc_norm
102
+ source:
103
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=SentientAGI%2FDobby-Mini-Unhinged-Llama-3.1-8B
104
+ name: Open LLM Leaderboard
105
+ - task:
106
+ type: text-generation
107
+ name: Text Generation
108
+ dataset:
109
+ name: MMLU-PRO (5-shot)
110
+ type: TIGER-Lab/MMLU-Pro
111
+ config: main
112
+ split: test
113
+ args:
114
+ num_few_shot: 5
115
+ metrics:
116
+ - type: acc
117
+ value: 28.72
118
+ name: accuracy
119
+ source:
120
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=SentientAGI%2FDobby-Mini-Unhinged-Llama-3.1-8B
121
+ name: Open LLM Leaderboard
122
  ---
123
 
124
  # Dobby-Mini-Unhinged-Llama-3.1-8B
 
326
 
327
  This model is derived from Llama 3.1 8B and is governed by the Llama 3.1 Community License Agreement. By using these weights, you agree to the terms set by Meta for Llama 3.1.
328
 
329
+ It is important to note that, as with all LLMs, factual inaccuracies may occur. Any investment or legal opinions expressed should be independently verified. Knowledge cutoff is the same as LLama-3.1-8B. That is, December 2023.
330
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
331
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/SentientAGI__Dobby-Mini-Unhinged-Llama-3.1-8B-details)!
332
+ Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=SentientAGI%2FDobby-Mini-Unhinged-Llama-3.1-8B&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!
333
+
334
+ | Metric |Value (%)|
335
+ |-------------------|--------:|
336
+ |**Average** | 26.73|
337
+ |IFEval (0-Shot) | 74.57|
338
+ |BBH (3-Shot) | 30.37|
339
+ |MATH Lvl 5 (4-Shot)| 11.25|
340
+ |GPQA (0-shot) | 7.49|
341
+ |MuSR (0-shot) | 7.96|
342
+ |MMLU-PRO (5-shot) | 28.72|
343
+