Text Generation
PyTorch
English
olmo2
conversational
vwxyzjn commited on
Commit
810cbd1
·
verified ·
1 Parent(s): 3994811

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -13
README.md CHANGED
@@ -85,7 +85,7 @@ See the Falcon 180B model card for an example of this.
85
 
86
  ## Performance
87
 
88
- | Model | Average | 2 LC | BBH | DROP | GSM8k | IFEval | MATH | MMLU | Safety | PopQA | TruthQA |
89
  |-------|---------|------|-----|------|-------|--------|------|------|--------|-------|---------|
90
  | **Closed API models** | | | | | | | | | | | |
91
  | GPT-3.5 Turbo 0125 | 59.6 | 38.7 | 66.6 | 70.2 | 74.3 | 66.9 | 41.2 | 70.2 | 69.1 | 45.0 | 62.9 |
@@ -106,18 +106,6 @@ See the Falcon 180B model card for an example of this.
106
  | **OLMo-2-32B-0325-DPO** | 68.8 | 44.1 | 70.2 | 77.5 | 85.7 | 83.8 | 46.8 | 78.0 | 91.9 | 36.4 | 73.5 |
107
  | **OLMo-2-32B-0325-Instruct** | 68.8 | 42.8 | 70.6 | 78.0 | 87.6 | 85.6 | 49.7 | 77.3 | 85.9 | 37.5 | 73.2 |
108
 
109
- ## Benchmark Descriptions
110
-
111
- - **2 LC**: Two-step logical constraints reasoning
112
- - **BBH**: Big Bench Hard tasks
113
- - **DROP**: Discrete Reasoning Over Paragraphs
114
- - **GSM8k**: Grade School Math 8k problems
115
- - **IFEval**: Instruction Following Evaluation
116
- - **MATH**: Mathematics problem-solving
117
- - **MMLU**: Massive Multitask Language Understanding
118
- - **Safety**: Safety and harmlessness evaluation
119
- - **PopQA**: Popular Question Answering
120
- - **TruthQA**: Truthfulness in question answering
121
 
122
 
123
  ## License and use
 
85
 
86
  ## Performance
87
 
88
+ | Model | Average | AlpacaEval 2 LC | BBH | DROP | GSM8k | IFEval | MATH | MMLU | Safety | PopQA | TruthQA |
89
  |-------|---------|------|-----|------|-------|--------|------|------|--------|-------|---------|
90
  | **Closed API models** | | | | | | | | | | | |
91
  | GPT-3.5 Turbo 0125 | 59.6 | 38.7 | 66.6 | 70.2 | 74.3 | 66.9 | 41.2 | 70.2 | 69.1 | 45.0 | 62.9 |
 
106
  | **OLMo-2-32B-0325-DPO** | 68.8 | 44.1 | 70.2 | 77.5 | 85.7 | 83.8 | 46.8 | 78.0 | 91.9 | 36.4 | 73.5 |
107
  | **OLMo-2-32B-0325-Instruct** | 68.8 | 42.8 | 70.6 | 78.0 | 87.6 | 85.6 | 49.7 | 77.3 | 85.9 | 37.5 | 73.2 |
108
 
 
 
 
 
 
 
 
 
 
 
 
 
109
 
110
 
111
  ## License and use