Novaciano commited on
Commit
c78ca0a
·
verified ·
1 Parent(s): 5ff3d4e

Adding Evaluation Results (#1)

Browse files

- Adding Evaluation Results (47ac6684665211d99ee430dd9f6063bb0283b038)

Files changed (1) hide show
  1. README.md +114 -1
README.md CHANGED
@@ -15,6 +15,105 @@ tags:
15
  language:
16
  - es
17
  - en
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  ---
19
  # merge
20
 
@@ -48,4 +147,18 @@ dtype: bfloat16
48
  parameters:
49
  t: [0, 0.5, 1, 0.5, 0]
50
 
51
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  language:
16
  - es
17
  - en
18
+ model-index:
19
+ - name: LEWD-Mental-Cultist-3.2-1B
20
+ results:
21
+ - task:
22
+ type: text-generation
23
+ name: Text Generation
24
+ dataset:
25
+ name: IFEval (0-Shot)
26
+ type: wis-k/instruction-following-eval
27
+ split: train
28
+ args:
29
+ num_few_shot: 0
30
+ metrics:
31
+ - type: inst_level_strict_acc and prompt_level_strict_acc
32
+ value: 53.09
33
+ name: averaged accuracy
34
+ source:
35
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Novaciano%2FLEWD-Mental-Cultist-3.2-1B
36
+ name: Open LLM Leaderboard
37
+ - task:
38
+ type: text-generation
39
+ name: Text Generation
40
+ dataset:
41
+ name: BBH (3-Shot)
42
+ type: SaylorTwift/bbh
43
+ split: test
44
+ args:
45
+ num_few_shot: 3
46
+ metrics:
47
+ - type: acc_norm
48
+ value: 8.64
49
+ name: normalized accuracy
50
+ source:
51
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Novaciano%2FLEWD-Mental-Cultist-3.2-1B
52
+ name: Open LLM Leaderboard
53
+ - task:
54
+ type: text-generation
55
+ name: Text Generation
56
+ dataset:
57
+ name: MATH Lvl 5 (4-Shot)
58
+ type: lighteval/MATH-Hard
59
+ split: test
60
+ args:
61
+ num_few_shot: 4
62
+ metrics:
63
+ - type: exact_match
64
+ value: 5.29
65
+ name: exact match
66
+ source:
67
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Novaciano%2FLEWD-Mental-Cultist-3.2-1B
68
+ name: Open LLM Leaderboard
69
+ - task:
70
+ type: text-generation
71
+ name: Text Generation
72
+ dataset:
73
+ name: GPQA (0-shot)
74
+ type: Idavidrein/gpqa
75
+ split: train
76
+ args:
77
+ num_few_shot: 0
78
+ metrics:
79
+ - type: acc_norm
80
+ value: 0.89
81
+ name: acc_norm
82
+ source:
83
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Novaciano%2FLEWD-Mental-Cultist-3.2-1B
84
+ name: Open LLM Leaderboard
85
+ - task:
86
+ type: text-generation
87
+ name: Text Generation
88
+ dataset:
89
+ name: MuSR (0-shot)
90
+ type: TAUR-Lab/MuSR
91
+ args:
92
+ num_few_shot: 0
93
+ metrics:
94
+ - type: acc_norm
95
+ value: 1.42
96
+ name: acc_norm
97
+ source:
98
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Novaciano%2FLEWD-Mental-Cultist-3.2-1B
99
+ name: Open LLM Leaderboard
100
+ - task:
101
+ type: text-generation
102
+ name: Text Generation
103
+ dataset:
104
+ name: MMLU-PRO (5-shot)
105
+ type: TIGER-Lab/MMLU-Pro
106
+ config: main
107
+ split: test
108
+ args:
109
+ num_few_shot: 5
110
+ metrics:
111
+ - type: acc
112
+ value: 8.54
113
+ name: accuracy
114
+ source:
115
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Novaciano%2FLEWD-Mental-Cultist-3.2-1B
116
+ name: Open LLM Leaderboard
117
  ---
118
  # merge
119
 
 
147
  parameters:
148
  t: [0, 0.5, 1, 0.5, 0]
149
 
150
+ ```
151
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
152
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/Novaciano__LEWD-Mental-Cultist-3.2-1B-details)!
153
+ Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=Novaciano%2FLEWD-Mental-Cultist-3.2-1B&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!
154
+
155
+ | Metric |Value (%)|
156
+ |-------------------|--------:|
157
+ |**Average** | 12.98|
158
+ |IFEval (0-Shot) | 53.09|
159
+ |BBH (3-Shot) | 8.64|
160
+ |MATH Lvl 5 (4-Shot)| 5.29|
161
+ |GPQA (0-shot) | 0.89|
162
+ |MuSR (0-shot) | 1.42|
163
+ |MMLU-PRO (5-shot) | 8.54|
164
+