Spaces:
Running
Running
Update content.py
Browse files- content.py +2 -2
content.py
CHANGED
@@ -12,8 +12,8 @@ Here you can compare models on tasks in Czech language and/or submit your own mo
|
|
12 |
- See **About** page for brief description of our evaluation protocol & win score mechanism, citation information, and future directions for this benchmark.
|
13 |
- __How scoring works__:
|
14 |
- On each task, the __Duel Win Score__ reports proportion of won duels.
|
15 |
-
- Category scores are obtained by averaging across category tasks.
|
16 |
-
-
|
17 |
- All public submissions are shared in [CZLC/LLM_benchmark_data](https://huggingface.co/datasets/CZLC/LLM_benchmark_data) dataset.
|
18 |
- In submission page, __you can obtain results on leaderboard without publishing them__.
|
19 |
- First step is "pre-submission", and after this is done (significance tests can take up to an hour), the results can be submitted if you'd like to.
|
|
|
12 |
- See **About** page for brief description of our evaluation protocol & win score mechanism, citation information, and future directions for this benchmark.
|
13 |
- __How scoring works__:
|
14 |
- On each task, the __Duel Win Score__ reports proportion of won duels.
|
15 |
+
- Category scores are obtained by averaging across category tasks. When selecting a category (other then Overall), the "Average" column shows Category Duel Win Scores.
|
16 |
+
- __Overall__ Duel Win Scores are an average over category scores. When selecting Overall category, the "Average" column shows Overall Duel Win Score.
|
17 |
- All public submissions are shared in [CZLC/LLM_benchmark_data](https://huggingface.co/datasets/CZLC/LLM_benchmark_data) dataset.
|
18 |
- In submission page, __you can obtain results on leaderboard without publishing them__.
|
19 |
- First step is "pre-submission", and after this is done (significance tests can take up to an hour), the results can be submitted if you'd like to.
|