metadata

datasets:
  - breadlicker45/1m-YA-dataset
train-eval-index:
  - config: default
    task: token-classification
    task_id: entity_extraction
    splits:
      eval_split: test
    col_mapping:
      tokens: tokens
      labels: tags

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	25.21
ARC (25-shot)	22.95
HellaSwag (10-shot)	27.29
MMLU (5-shot)	26.25
TruthfulQA (0-shot)	47.02
Winogrande (5-shot)	50.67
GSM8K (5-shot)	0.0
DROP (3-shot)	2.32