Novaciano commited on
Commit
d41cec9
·
verified ·
1 Parent(s): fe31207

Adding Evaluation Results (#1)

Browse files

- Adding Evaluation Results (9224d2adbbe76b4b0a081d58fce52c0939491ca2)

Files changed (1) hide show
  1. README.md +114 -1
README.md CHANGED
@@ -17,6 +17,105 @@ tags:
17
  language:
18
  - es
19
  - en
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  ---
21
  # ASTAROTH 3.2 1B
22
 
@@ -55,4 +154,18 @@ parameters:
55
  chat_template: auto
56
  tokenizer:
57
  source: union
58
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  language:
18
  - es
19
  - en
20
+ model-index:
21
+ - name: ASTAROTH-3.2-1B
22
+ results:
23
+ - task:
24
+ type: text-generation
25
+ name: Text Generation
26
+ dataset:
27
+ name: IFEval (0-Shot)
28
+ type: wis-k/instruction-following-eval
29
+ split: train
30
+ args:
31
+ num_few_shot: 0
32
+ metrics:
33
+ - type: inst_level_strict_acc and prompt_level_strict_acc
34
+ value: 56.13
35
+ name: averaged accuracy
36
+ source:
37
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Novaciano%2FASTAROTH-3.2-1B
38
+ name: Open LLM Leaderboard
39
+ - task:
40
+ type: text-generation
41
+ name: Text Generation
42
+ dataset:
43
+ name: BBH (3-Shot)
44
+ type: SaylorTwift/bbh
45
+ split: test
46
+ args:
47
+ num_few_shot: 3
48
+ metrics:
49
+ - type: acc_norm
50
+ value: 9.49
51
+ name: normalized accuracy
52
+ source:
53
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Novaciano%2FASTAROTH-3.2-1B
54
+ name: Open LLM Leaderboard
55
+ - task:
56
+ type: text-generation
57
+ name: Text Generation
58
+ dataset:
59
+ name: MATH Lvl 5 (4-Shot)
60
+ type: lighteval/MATH-Hard
61
+ split: test
62
+ args:
63
+ num_few_shot: 4
64
+ metrics:
65
+ - type: exact_match
66
+ value: 7.33
67
+ name: exact match
68
+ source:
69
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Novaciano%2FASTAROTH-3.2-1B
70
+ name: Open LLM Leaderboard
71
+ - task:
72
+ type: text-generation
73
+ name: Text Generation
74
+ dataset:
75
+ name: GPQA (0-shot)
76
+ type: Idavidrein/gpqa
77
+ split: train
78
+ args:
79
+ num_few_shot: 0
80
+ metrics:
81
+ - type: acc_norm
82
+ value: 0.78
83
+ name: acc_norm
84
+ source:
85
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Novaciano%2FASTAROTH-3.2-1B
86
+ name: Open LLM Leaderboard
87
+ - task:
88
+ type: text-generation
89
+ name: Text Generation
90
+ dataset:
91
+ name: MuSR (0-shot)
92
+ type: TAUR-Lab/MuSR
93
+ args:
94
+ num_few_shot: 0
95
+ metrics:
96
+ - type: acc_norm
97
+ value: 1.21
98
+ name: acc_norm
99
+ source:
100
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Novaciano%2FASTAROTH-3.2-1B
101
+ name: Open LLM Leaderboard
102
+ - task:
103
+ type: text-generation
104
+ name: Text Generation
105
+ dataset:
106
+ name: MMLU-PRO (5-shot)
107
+ type: TIGER-Lab/MMLU-Pro
108
+ config: main
109
+ split: test
110
+ args:
111
+ num_few_shot: 5
112
+ metrics:
113
+ - type: acc
114
+ value: 10.1
115
+ name: accuracy
116
+ source:
117
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Novaciano%2FASTAROTH-3.2-1B
118
+ name: Open LLM Leaderboard
119
  ---
120
  # ASTAROTH 3.2 1B
121
 
 
154
  chat_template: auto
155
  tokenizer:
156
  source: union
157
+ ```
158
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
159
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/Novaciano__ASTAROTH-3.2-1B-details)!
160
+ Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=Novaciano%2FASTAROTH-3.2-1B&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!
161
+
162
+ | Metric |Value (%)|
163
+ |-------------------|--------:|
164
+ |**Average** | 14.17|
165
+ |IFEval (0-Shot) | 56.13|
166
+ |BBH (3-Shot) | 9.49|
167
+ |MATH Lvl 5 (4-Shot)| 7.33|
168
+ |GPQA (0-shot) | 0.78|
169
+ |MuSR (0-shot) | 1.21|
170
+ |MMLU-PRO (5-shot) | 10.10|
171
+