Upload README.md
Browse files
README.md
CHANGED
|
@@ -59,8 +59,8 @@ dtype: float16
|
|
| 59 |
| Model | Average | ARC | HellaSwag | MMLU | TruthfulQA | Ko-CommonGenV2 |
|
| 60 |
| --- | --- | --- | --- | --- | --- | --- |
|
| 61 |
| PracticeLLM/Twice-KoSOLAR-16.1B-test | NaN | NaN | NaN | NaN | NaN | NaN |
|
|
|
|
| 62 |
| [seungduk/KoSOLAR-10.7B-v0.1](https://huggingface.co/seungduk/KoSOLAR-10.7B-v0.1) | 52.40 | 47.18 | 59.54 | 52.04 | 41.84 | 61.39 |
|
| 63 |
-
| [jjourney1125/M-SOLAR-10.7B-v1.0](https://huggingface.co/jjourney1125/M-SOLAR-10.7B-v1.0) | 55.15 | 49.57 | 60.12 | 54.60 | 49.23 | 62.22 |
|
| 64 |
|
| 65 |
- Follow up as [En-link](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
|
| 66 |
| Model | Average | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K |
|
|
@@ -87,6 +87,19 @@ gpt2 (pretrained=PracticeLLM/Twice-KoSOLAR-16.1B-test), limit: None, provide_des
|
|
| 87 |
|kobest_sentineg | 0|acc |0.7078|± |0.0229|
|
| 88 |
| | |macro_f1|0.7071|± |0.0229|
|
| 89 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 90 |
gpt2 (pretrained=yanolja/KoSOLAR-10.7B-v0.1), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
|
| 91 |
| Task |Version| Metric |Value | |Stderr|
|
| 92 |
|----------------|------:|--------|-----:|---|-----:|
|
|
@@ -99,6 +112,8 @@ gpt2 (pretrained=yanolja/KoSOLAR-10.7B-v0.1), limit: None, provide_description:
|
|
| 99 |
| | |macro_f1|0.4296|± |0.0221|
|
| 100 |
|kobest_sentineg | 0|acc |0.7506|± |0.0217|
|
| 101 |
| | |macro_f1|0.7505|± |0.0217|
|
|
|
|
|
|
|
| 102 |
```
|
| 103 |
|
| 104 |
- Follow up as [Eleuther/LM-Harness](https://github.com/EleutherAI/lm-evaluation-harness)
|
|
|
|
| 59 |
| Model | Average | ARC | HellaSwag | MMLU | TruthfulQA | Ko-CommonGenV2 |
|
| 60 |
| --- | --- | --- | --- | --- | --- | --- |
|
| 61 |
| PracticeLLM/Twice-KoSOLAR-16.1B-test | NaN | NaN | NaN | NaN | NaN | NaN |
|
| 62 |
+
| [jjourney1125/M-SOLAR-10.7B-v1.0](https://huggingface.co/jjourney1125/M-SOLAR-10.7B-v1.0) | 55.15 | 49.57 | 60.12 | 54.60 | 49.23 | 62.22 |
|
| 63 |
| [seungduk/KoSOLAR-10.7B-v0.1](https://huggingface.co/seungduk/KoSOLAR-10.7B-v0.1) | 52.40 | 47.18 | 59.54 | 52.04 | 41.84 | 61.39 |
|
|
|
|
| 64 |
|
| 65 |
- Follow up as [En-link](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
|
| 66 |
| Model | Average | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K |
|
|
|
|
| 87 |
|kobest_sentineg | 0|acc |0.7078|± |0.0229|
|
| 88 |
| | |macro_f1|0.7071|± |0.0229|
|
| 89 |
|
| 90 |
+
gpt2 (pretrained=jjourney1125/M-SOLAR-10.7B-v1.0), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
|
| 91 |
+
| Task |Version| Metric |Value | |Stderr|
|
| 92 |
+
|----------------|------:|--------|-----:|---|-----:|
|
| 93 |
+
|kobest_boolq | 0|acc |0.5228|± |0.0133|
|
| 94 |
+
| | |macro_f1|0.3788|± |0.0097|
|
| 95 |
+
|kobest_copa | 0|acc |0.6860|± |0.0147|
|
| 96 |
+
| | |macro_f1|0.6858|± |0.0147|
|
| 97 |
+
|kobest_hellaswag| 0|acc |0.4580|± |0.0223|
|
| 98 |
+
| | |acc_norm|0.5380|± |0.0223|
|
| 99 |
+
| | |macro_f1|0.4552|± |0.0222|
|
| 100 |
+
|kobest_sentineg | 0|acc |0.6474|± |0.0240|
|
| 101 |
+
| | |macro_f1|0.6012|± |0.0257|
|
| 102 |
+
|
| 103 |
gpt2 (pretrained=yanolja/KoSOLAR-10.7B-v0.1), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
|
| 104 |
| Task |Version| Metric |Value | |Stderr|
|
| 105 |
|----------------|------:|--------|-----:|---|-----:|
|
|
|
|
| 112 |
| | |macro_f1|0.4296|± |0.0221|
|
| 113 |
|kobest_sentineg | 0|acc |0.7506|± |0.0217|
|
| 114 |
| | |macro_f1|0.7505|± |0.0217|
|
| 115 |
+
|
| 116 |
+
|
| 117 |
```
|
| 118 |
|
| 119 |
- Follow up as [Eleuther/LM-Harness](https://github.com/EleutherAI/lm-evaluation-harness)
|