Update README.md
Browse files
README.md
CHANGED
@@ -33,7 +33,7 @@
|
|
33 |
<br> π€ Outperforms <span style="font-weight: bold;">ChatGPT</span> (March) and <span style="font-weight: bold;">Grok-1</span> on most benchmarks π€
|
34 |
<br> π<span style="font-size: 1em; font-family: 'Helvetica'; color: black; font-weight: bold;">15</span>-point improvement in Coding Performance over <span style="font-size: 0.9em;
|
35 |
font-family: 'Helvetica'; color: black; font-weight: bold;">OpenChat-3.5π</span>
|
36 |
-
<br><span style="font-size: 1em; font-family: 'Helvetica'; color: #3c72db; font-weight: bold;">New Features</span>
|
37 |
<br> π‘ 2 Modes: Coding + Generalist, Mathematical Reasoning π‘
|
38 |
<br> π§ββοΈ Experimental support for Evaluator and Feedback capabilities π§ββοΈ
|
39 |
</span>
|
@@ -41,7 +41,7 @@
|
|
41 |
</div>
|
42 |
|
43 |
<div style="display: flex; justify-content: center; align-items: center">
|
44 |
-
<img src="https://github.com/alpayariyak/openchat/blob/master/assets/1210bench.png?raw=true" style="width: 100%; border-radius: 1em">
|
45 |
</div>
|
46 |
|
47 |
<div>
|
@@ -174,6 +174,7 @@ Score 5: {orig_score5_description}
|
|
174 |
| OpenOrca Mistral | 7B | 52.7 | 6.86 | 38.4 | 49.4 | 42.9 | 45.9 | 59.3 | 59.1 | 58.1 |
|
175 |
| Zephyr-Ξ²^ | 7B | 34.6 | 7.34 | 22.0 | 40.6 | 39.0 | 40.8 | 39.8 | 5.1 | 16.0 |
|
176 |
| Mistral | 7B | - | 6.84 | 30.5 | 39.0 | 38.0 | - | 60.1 | 52.2 | - |
|
|
|
177 |
<details>
|
178 |
<summary>Evaluation Details(click to expand)</summary>
|
179 |
*: ChatGPT (March) results are from [GPT-4 Technical Report](https://arxiv.org/abs/2303.08774), [Chain-of-Thought Hub](https://github.com/FranxYao/chain-of-thought-hub), and our evaluation. Please note that ChatGPT is not a fixed baseline and evolves rapidly over time.
|
@@ -188,7 +189,6 @@ All models are evaluated in chat mode (e.g. with the respective conversation tem
|
|
188 |
<h3>HumanEval+</h3>
|
189 |
</div>
|
190 |
|
191 |
-
|
192 |
| Model | Size | HumanEval+ pass@1 |
|
193 |
|-----------------------------|----------|------------|
|
194 |
| ChatGPT (December 12, 2023) | - | 64.6 |
|
|
|
33 |
<br> π€ Outperforms <span style="font-weight: bold;">ChatGPT</span> (March) and <span style="font-weight: bold;">Grok-1</span> on most benchmarks π€
|
34 |
<br> π<span style="font-size: 1em; font-family: 'Helvetica'; color: black; font-weight: bold;">15</span>-point improvement in Coding Performance over <span style="font-size: 0.9em;
|
35 |
font-family: 'Helvetica'; color: black; font-weight: bold;">OpenChat-3.5π</span>
|
36 |
+
<br><br><span style="font-size: 1em; font-family: 'Helvetica'; color: #3c72db; font-weight: bold;">New Features</span>
|
37 |
<br> π‘ 2 Modes: Coding + Generalist, Mathematical Reasoning π‘
|
38 |
<br> π§ββοΈ Experimental support for Evaluator and Feedback capabilities π§ββοΈ
|
39 |
</span>
|
|
|
41 |
</div>
|
42 |
|
43 |
<div style="display: flex; justify-content: center; align-items: center">
|
44 |
+
<img src="https://github.com/alpayariyak/openchat/blob/master/assets/1210bench.png?raw=true" style="width: 100%; border-radius: 1em">
|
45 |
</div>
|
46 |
|
47 |
<div>
|
|
|
174 |
| OpenOrca Mistral | 7B | 52.7 | 6.86 | 38.4 | 49.4 | 42.9 | 45.9 | 59.3 | 59.1 | 58.1 |
|
175 |
| Zephyr-Ξ²^ | 7B | 34.6 | 7.34 | 22.0 | 40.6 | 39.0 | 40.8 | 39.8 | 5.1 | 16.0 |
|
176 |
| Mistral | 7B | - | 6.84 | 30.5 | 39.0 | 38.0 | - | 60.1 | 52.2 | - |
|
177 |
+
|
178 |
<details>
|
179 |
<summary>Evaluation Details(click to expand)</summary>
|
180 |
*: ChatGPT (March) results are from [GPT-4 Technical Report](https://arxiv.org/abs/2303.08774), [Chain-of-Thought Hub](https://github.com/FranxYao/chain-of-thought-hub), and our evaluation. Please note that ChatGPT is not a fixed baseline and evolves rapidly over time.
|
|
|
189 |
<h3>HumanEval+</h3>
|
190 |
</div>
|
191 |
|
|
|
192 |
| Model | Size | HumanEval+ pass@1 |
|
193 |
|-----------------------------|----------|------------|
|
194 |
| ChatGPT (December 12, 2023) | - | 64.6 |
|