Update README.md
Browse files
README.md
CHANGED
@@ -29,12 +29,6 @@ It slightly improves upon the performance of the basemodel on the following task
|
|
29 |
|
30 |
# Eval Results aloobun/d-SmolLM2-360M (WIP)
|
31 |
|
32 |
-
Todo:
|
33 |
-
|
34 |
-
ifeval (0-shot, generative)
|
35 |
-
|
36 |
-
Math-lvl-5 (4-shots, generative, minerva version)
|
37 |
-
|
38 |
|
39 |
## GPQA
|
40 |
|
@@ -100,3 +94,16 @@ Math-lvl-5 (4-shots, generative, minerva version)
|
|
100 |
| | |none | 0|inst_level_strict_acc |↑ |0.2770|± | N/A|
|
101 |
| | |none | 0|prompt_level_loose_acc |↑ |0.1497|± |0.0154|
|
102 |
| | |none | 0|prompt_level_strict_acc|↑ |0.1423|± |0.0150|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
|
30 |
# Eval Results aloobun/d-SmolLM2-360M (WIP)
|
31 |
|
|
|
|
|
|
|
|
|
|
|
|
|
32 |
|
33 |
## GPQA
|
34 |
|
|
|
94 |
| | |none | 0|inst_level_strict_acc |↑ |0.2770|± | N/A|
|
95 |
| | |none | 0|prompt_level_loose_acc |↑ |0.1497|± |0.0154|
|
96 |
| | |none | 0|prompt_level_strict_acc|↑ |0.1423|± |0.0150|
|
97 |
+
|
98 |
+
## MATH HARD
|
99 |
+
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|
100 |
+
|---------------------------------------------|-------|------|-----:|-----------|---|-----:|---|-----:|
|
101 |
+
|leaderboard_math_hard | N/A| | | | | | | |
|
102 |
+
| - leaderboard_math_algebra_hard | 2|none | 4|exact_match|↑ |0.0033|± |0.0033|
|
103 |
+
| - leaderboard_math_counting_and_prob_hard | 2|none | 4|exact_match|↑ |0.0081|± |0.0081|
|
104 |
+
| - leaderboard_math_geometry_hard | 2|none | 4|exact_match|↑ |0.0000|± |0.0000|
|
105 |
+
| - leaderboard_math_intermediate_algebra_hard| 2|none | 4|exact_match|↑ |0.0000|± |0.0000|
|
106 |
+
| - leaderboard_math_num_theory_hard | 2|none | 4|exact_match|↑ |0.0065|± |0.0065|
|
107 |
+
| - leaderboard_math_prealgebra_hard | 2|none | 4|exact_match|↑ |0.0104|± |0.0073|
|
108 |
+
| - leaderboard_math_precalculus_hard | 2|none | 4|exact_match|↑ |0.0000|± |0.0000|
|
109 |
+
|