mfajcik commited on
Commit
dab68b3
·
verified ·
1 Parent(s): 7d29744

Update content.py

Browse files
Files changed (1) hide show
  1. content.py +2 -2
content.py CHANGED
@@ -12,8 +12,8 @@ Here you can compare models on tasks in Czech language and/or submit your own mo
12
  - See **About** page for brief description of our evaluation protocol & win score mechanism, citation information, and future directions for this benchmark.
13
  - __How scoring works__:
14
  - On each task, the __Duel Win Score__ reports proportion of won duels.
15
- - Category scores are obtained by averaging across category tasks.
16
- - __Average__ Duel Win Scores are an average over category scores.
17
  - All public submissions are shared in [CZLC/LLM_benchmark_data](https://huggingface.co/datasets/CZLC/LLM_benchmark_data) dataset.
18
  - In submission page, __you can obtain results on leaderboard without publishing them__.
19
  - First step is "pre-submission", and after this is done (significance tests can take up to an hour), the results can be submitted if you'd like to.
 
12
  - See **About** page for brief description of our evaluation protocol & win score mechanism, citation information, and future directions for this benchmark.
13
  - __How scoring works__:
14
  - On each task, the __Duel Win Score__ reports proportion of won duels.
15
+ - Category scores are obtained by averaging across category tasks. When selecting a category (other then Overall), the "Average" column shows Category Duel Win Scores.
16
+ - __Overall__ Duel Win Scores are an average over category scores. When selecting Overall category, the "Average" column shows Overall Duel Win Score.
17
  - All public submissions are shared in [CZLC/LLM_benchmark_data](https://huggingface.co/datasets/CZLC/LLM_benchmark_data) dataset.
18
  - In submission page, __you can obtain results on leaderboard without publishing them__.
19
  - First step is "pre-submission", and after this is done (significance tests can take up to an hour), the results can be submitted if you'd like to.