Text Generation
Transformers
Safetensors
mixtral
reasoning
preference_learning
nca
conversational
text-generation-inference
Inference Endpoints
lievan commited on
Commit
99d4a87
·
verified ·
1 Parent(s): 6f494ca

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +101 -8
README.md CHANGED
@@ -43,14 +43,107 @@ It achieves superb reasoning performance as well as exellent chat & instruction-
43
  ## Evaluation
44
  We conducted overall coding, math, reasoning, knowledge, instruction-following and chat benchmarking. Results are shown below:
45
 
46
- | Model | Coding | Math | Reasoning | Knowledge | Ins-Following | Chat |
47
- |-----------------|:---------:|:-----:|:--------:|:-------:|:-----:|:---------:|:---------:|:---------:|:-------------:|:--------:|
48
- | | HumanEval | MBPP | LeetCode | GSMPLUS | MATH | TheoremQA | BBH (CoT) | MMLU | IFEval | MT-Bench |
49
- | GPT-3.5-Turbo | 76.8 | 82.5 | 23.3 | 61.2 | 37.8 | 35.6 | 70.1 | 70.0 | 56.6 | 7.94 |
50
- | GPT-4 | 85.4 | 83.5 | 41.8 | 85.6 | 69.7 | 52.4 | 86.7 | 86.4 | 79.7 | 8.96 |
51
- | Eurus-70b-NCA | 79.3 | 71.9 | 33.3 | 62.8 | 41.7 | 32.6 | 80.0 | 59.4 | 49.2 | 7.54 |
52
- | Eurux-8x22b-KTO | 71.3 | 68.9 | 29.4 | 68.3 | 48.4 | 35.3 | 83.6 | 75.9 | 67.1 | 8.58 |
53
- | Eurux-8x22b-NCA | 75.0 | 69.7 | 35.0 | 68.1 | 49.0 | 35.5 | 83.5 | 75.6 | 67.1 | 8.46 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
 
55
  ## Usage
56
 
 
43
  ## Evaluation
44
  We conducted overall coding, math, reasoning, knowledge, instruction-following and chat benchmarking. Results are shown below:
45
 
46
+ <style type="text/css">
47
+ .tg {border-collapse:collapse;border-spacing:0;}
48
+ .tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
49
+ overflow:hidden;padding:10px 5px;word-break:normal;}
50
+ .tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
51
+ font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
52
+ .tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:top}
53
+ .tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
54
+ </style>
55
+ <table class="tg">
56
+ <thead>
57
+ <tr>
58
+ <th class="tg-0pky" rowspan="2"><span style="font-weight:bold;color:#000">Model</span></th>
59
+ <th class="tg-c3ow" colspan="3"><span style="font-weight:bold;color:#000">Coding</span></th>
60
+ <th class="tg-c3ow" colspan="3"><span style="font-weight:bold;color:#000">Math</span></th>
61
+ <th class="tg-c3ow"><span style="font-weight:bold;color:#000">Reasoning</span></th>
62
+ <th class="tg-c3ow"><span style="font-weight:bold;color:#000">Knowledge</span></th>
63
+ <th class="tg-c3ow"><span style="font-weight:bold;color:#000">Ins-Following</span></th>
64
+ <th class="tg-c3ow"><span style="font-weight:bold;color:#000">Chat</span></th>
65
+ </tr>
66
+ <tr>
67
+ <th class="tg-c3ow"><span style="font-weight:bold;color:#000">HumanEval</span></th>
68
+ <th class="tg-c3ow"><span style="font-weight:bold;color:#000">MBPP</span></th>
69
+ <th class="tg-c3ow"><span style="font-weight:bold;color:#000">LeetCode</span></th>
70
+ <th class="tg-c3ow"><span style="font-weight:bold;color:#000">GSMPLUS</span></th>
71
+ <th class="tg-c3ow"><span style="font-weight:bold;color:#000">MATH</span></th>
72
+ <th class="tg-c3ow"><span style="font-weight:bold;color:#000">TheoremQA</span></th>
73
+ <th class="tg-c3ow"><span style="font-weight:bold;color:#000">BBH (CoT)</span></th>
74
+ <th class="tg-c3ow"><span style="font-weight:bold;color:#000">MMLU</span></th>
75
+ <th class="tg-c3ow"><span style="font-weight:bold;color:#000">IFEval</span></th>
76
+ <th class="tg-c3ow"><span style="font-weight:bold;color:#000">MT-Bench</span></th>
77
+ </tr>
78
+ </thead>
79
+ <tbody>
80
+ <tr>
81
+ <td class="tg-0pky"><span style="color:#000">GPT-3.5-Turbo</span></td>
82
+ <td class="tg-c3ow"><span style="color:#000">76.8</span> </td>
83
+ <td class="tg-c3ow"><span style="color:#000">82.5</span> </td>
84
+ <td class="tg-c3ow"><span style="color:#000">23.3</span> </td>
85
+ <td class="tg-c3ow"><span style="color:#000">61.2</span> </td>
86
+ <td class="tg-c3ow"><span style="color:#000">37.8</span> </td>
87
+ <td class="tg-c3ow"><span style="color:#000">35.6</span> </td>
88
+ <td class="tg-c3ow"><span style="color:#000">70.1</span> </td>
89
+ <td class="tg-c3ow"><span style="color:#000">70.0</span> </td>
90
+ <td class="tg-c3ow"><span style="color:#000">56.6</span> </td>
91
+ <td class="tg-c3ow"><span style="color:#000">7.94</span> </td>
92
+ </tr>
93
+ <tr>
94
+ <td class="tg-0pky"><span style="color:#000">GPT-4</span></td>
95
+ <td class="tg-c3ow"><span style="color:#000">85.4</span> </td>
96
+ <td class="tg-c3ow"><span style="color:#000">83.5</span> </td>
97
+ <td class="tg-c3ow"><span style="color:#000">41.8</span> </td>
98
+ <td class="tg-c3ow"><span style="color:#000">85.6</span> </td>
99
+ <td class="tg-c3ow"><span style="color:#000">69.7</span> </td>
100
+ <td class="tg-c3ow"><span style="color:#000">52.4</span> </td>
101
+ <td class="tg-c3ow"><span style="color:#000">86.7</span> </td>
102
+ <td class="tg-c3ow"><span style="color:#000">86.4</span> </td>
103
+ <td class="tg-c3ow"><span style="color:#000">79.7</span> </td>
104
+ <td class="tg-c3ow"><span style="color:#000">8.96</span> </td>
105
+ </tr>
106
+ <tr>
107
+ <td class="tg-0pky"><span style="color:#000">Eurus-70b-NCA</span></td>
108
+ <td class="tg-c3ow"><span style="color:#000">79.3</span> </td>
109
+ <td class="tg-c3ow"><span style="color:#000">71.9</span> </td>
110
+ <td class="tg-c3ow"><span style="color:#000">33.3</span> </td>
111
+ <td class="tg-c3ow"><span style="color:#000">62.8</span> </td>
112
+ <td class="tg-c3ow"><span style="color:#000">41.7</span> </td>
113
+ <td class="tg-c3ow"><span style="color:#000">32.6</span> </td>
114
+ <td class="tg-c3ow"><span style="color:#000">80.0</span> </td>
115
+ <td class="tg-c3ow"><span style="color:#000">59.4</span> </td>
116
+ <td class="tg-c3ow"><span style="color:#000">49.2</span> </td>
117
+ <td class="tg-c3ow"><span style="color:#000">7.54</span> </td>
118
+ </tr>
119
+ <tr>
120
+ <td class="tg-0pky"><span style="color:#000">Eurux-8x22b-KTO</span></td>
121
+ <td class="tg-c3ow"><span style="color:#000">71.3</span> </td>
122
+ <td class="tg-c3ow"><span style="color:#000">68.9</span> </td>
123
+ <td class="tg-c3ow"><span style="color:#000">29.4</span> </td>
124
+ <td class="tg-c3ow"><span style="color:#000">68.3</span> </td>
125
+ <td class="tg-c3ow"><span style="color:#000">48.4</span> </td>
126
+ <td class="tg-c3ow"><span style="color:#000">35.3</span> </td>
127
+ <td class="tg-c3ow"><span style="color:#000">83.6</span> </td>
128
+ <td class="tg-c3ow"><span style="color:#000">75.9</span> </td>
129
+ <td class="tg-c3ow"><span style="color:#000">67.1</span> </td>
130
+ <td class="tg-c3ow"><span style="color:#000">8.58</span> </td>
131
+ </tr>
132
+ <tr>
133
+ <td class="tg-0pky"><span style="color:#000">Eurux-8x22b-NCA</span></td>
134
+ <td class="tg-c3ow"><span style="color:#000">75.0</span> </td>
135
+ <td class="tg-c3ow"><span style="color:#000">69.7</span> </td>
136
+ <td class="tg-c3ow"><span style="color:#000">35.0</span> </td>
137
+ <td class="tg-c3ow"><span style="color:#000">68.1</span> </td>
138
+ <td class="tg-c3ow"><span style="color:#000">49.0</span> </td>
139
+ <td class="tg-c3ow"><span style="color:#000">35.5</span> </td>
140
+ <td class="tg-c3ow"><span style="color:#000">83.5</span> </td>
141
+ <td class="tg-c3ow"><span style="color:#000">75.6</span> </td>
142
+ <td class="tg-c3ow"><span style="color:#000">67.1</span> </td>
143
+ <td class="tg-c3ow"><span style="color:#000">8.46</span> </td>
144
+ </tr>
145
+ </tbody>
146
+ </table>
147
 
148
  ## Usage
149