Commit
·
f592fa6
1
Parent(s):
b93ddae
Update README.md
Browse files
README.md
CHANGED
@@ -240,6 +240,15 @@ All models are evaluated in chat mode (e.g. with the respective conversation tem
|
|
240 |
| ChatGPT | 47.81 | 55.68 | 56.5 | 62.66 | 50.69 | 55.51 |
|
241 |
| OpenChat | 38.7 | 45.99 | 48.32 | 50.23 | 43.27 | 45.85 |
|
242 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
243 |
|
244 |
<div align="center">
|
245 |
<h2> Limitations </h2>
|
|
|
240 |
| ChatGPT | 47.81 | 55.68 | 56.5 | 62.66 | 50.69 | 55.51 |
|
241 |
| OpenChat | 38.7 | 45.99 | 48.32 | 50.23 | 43.27 | 45.85 |
|
242 |
|
243 |
+
<div>
|
244 |
+
<h3>Multi-Level Multi-Discipline Chinese Evaluation Suite (CEVAL)</h3>
|
245 |
+
<div>
|
246 |
+
|
247 |
+
| Model | Avg | STEM | Social Science | Humanities | Others |
|
248 |
+
|----------|-------|-------|----------------|------------|--------|
|
249 |
+
| ChatGPT | 54.4 | 52.9 | 61.8 | 50.9 | 53.6 |
|
250 |
+
| OpenChat | 47.29 | 45.22 | 52.49 | 48.52 | 45.08 |
|
251 |
+
|
252 |
|
253 |
<div align="center">
|
254 |
<h2> Limitations </h2>
|