Update README.md
Browse files
README.md
CHANGED
@@ -85,7 +85,7 @@ See the Falcon 180B model card for an example of this.
|
|
85 |
|
86 |
## Performance
|
87 |
|
88 |
-
| Model | Average | 2 LC | BBH | DROP | GSM8k | IFEval | MATH | MMLU | Safety | PopQA | TruthQA |
|
89 |
|-------|---------|------|-----|------|-------|--------|------|------|--------|-------|---------|
|
90 |
| **Closed API models** | | | | | | | | | | | |
|
91 |
| GPT-3.5 Turbo 0125 | 59.6 | 38.7 | 66.6 | 70.2 | 74.3 | 66.9 | 41.2 | 70.2 | 69.1 | 45.0 | 62.9 |
|
@@ -106,18 +106,6 @@ See the Falcon 180B model card for an example of this.
|
|
106 |
| **OLMo-2-32B-0325-DPO** | 68.8 | 44.1 | 70.2 | 77.5 | 85.7 | 83.8 | 46.8 | 78.0 | 91.9 | 36.4 | 73.5 |
|
107 |
| **OLMo-2-32B-0325-Instruct** | 68.8 | 42.8 | 70.6 | 78.0 | 87.6 | 85.6 | 49.7 | 77.3 | 85.9 | 37.5 | 73.2 |
|
108 |
|
109 |
-
## Benchmark Descriptions
|
110 |
-
|
111 |
-
- **2 LC**: Two-step logical constraints reasoning
|
112 |
-
- **BBH**: Big Bench Hard tasks
|
113 |
-
- **DROP**: Discrete Reasoning Over Paragraphs
|
114 |
-
- **GSM8k**: Grade School Math 8k problems
|
115 |
-
- **IFEval**: Instruction Following Evaluation
|
116 |
-
- **MATH**: Mathematics problem-solving
|
117 |
-
- **MMLU**: Massive Multitask Language Understanding
|
118 |
-
- **Safety**: Safety and harmlessness evaluation
|
119 |
-
- **PopQA**: Popular Question Answering
|
120 |
-
- **TruthQA**: Truthfulness in question answering
|
121 |
|
122 |
|
123 |
## License and use
|
|
|
85 |
|
86 |
## Performance
|
87 |
|
88 |
+
| Model | Average | AlpacaEval 2 LC | BBH | DROP | GSM8k | IFEval | MATH | MMLU | Safety | PopQA | TruthQA |
|
89 |
|-------|---------|------|-----|------|-------|--------|------|------|--------|-------|---------|
|
90 |
| **Closed API models** | | | | | | | | | | | |
|
91 |
| GPT-3.5 Turbo 0125 | 59.6 | 38.7 | 66.6 | 70.2 | 74.3 | 66.9 | 41.2 | 70.2 | 69.1 | 45.0 | 62.9 |
|
|
|
106 |
| **OLMo-2-32B-0325-DPO** | 68.8 | 44.1 | 70.2 | 77.5 | 85.7 | 83.8 | 46.8 | 78.0 | 91.9 | 36.4 | 73.5 |
|
107 |
| **OLMo-2-32B-0325-Instruct** | 68.8 | 42.8 | 70.6 | 78.0 | 87.6 | 85.6 | 49.7 | 77.3 | 85.9 | 37.5 | 73.2 |
|
108 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
109 |
|
110 |
|
111 |
## License and use
|