Update README.md
Browse files
README.md
CHANGED
@@ -6,6 +6,7 @@ tags:
|
|
6 |
- C-RLFT
|
7 |
datasets:
|
8 |
- openchat/openchat_sharegpt4_dataset
|
|
|
9 |
- imone/OpenOrca_FLAN
|
10 |
- LDJnr/LessWrong-Amplify-Instruct
|
11 |
- LDJnr/Pure-Dove
|
@@ -19,8 +20,6 @@ library_name: transformers
|
|
19 |
pipeline_tag: text-generation
|
20 |
---
|
21 |
|
22 |
-
# OpenChat (1210 Version): Advancing Open-source Language Models with Mixed-Quality Data
|
23 |
-
|
24 |
<div align="center">
|
25 |
<img src="https://raw.githubusercontent.com/imoneoi/openchat/master/assets/logo_new.png" style="width: 65%">
|
26 |
</div>
|
@@ -34,18 +33,27 @@ pipeline_tag: text-generation
|
|
34 |
<a href="https://arxiv.org/pdf/2309.11235.pdf">Paper</a>
|
35 |
</p>
|
36 |
|
37 |
-
|
|
|
|
|
|
|
|
|
38 |
|
39 |
-
|
40 |
-
|-----------------------------|------------|
|
41 |
-
| GPT-3.5 (December 2023) | 64.6 |
|
42 |
-
| **OpenChat 3.5 1210** | **63.4** |
|
43 |
-
| GPT-3.5 (March 2023) | 64.6 |
|
44 |
-
| OpenHermes 2.5 | 41.5 |
|
45 |
|
46 |
-
|
47 |
-
|
48 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
49 |
|
50 |
OpenChat is an innovative library of open-source language models, fine-tuned with [C-RLFT](https://arxiv.org/pdf/2309.11235.pdf) - a strategy inspired by offline reinforcement learning. Our models learn from mixed-quality data without preference labels, delivering exceptional performance on par with ChatGPT, even with a 7B model. Despite our simple approach, we are committed to developing a high-performance, commercially viable, open-source large language model, and we continue to make significant strides toward this vision.
|
51 |
|
@@ -59,10 +67,14 @@ Once started, the server listens at `localhost:18888` for requests and is compat
|
|
59 |
|
60 |
If you want to deploy the server as an online service, you can use `--api-keys sk-KEY1 sk-KEY2 ...` to specify allowed API keys and `--disable-log-requests --disable-log-stats --log-file openchat.log` for logging only to a file. For security purposes, we recommend using an [HTTPS gateway](https://fastapi.tiangolo.com/es/deployment/concepts/#security-https) in front of the server.
|
61 |
|
|
|
|
|
|
|
|
|
62 |
<details>
|
63 |
<summary>Example request (click to expand)</summary>
|
64 |
|
65 |
-
Default Mode (
|
66 |
|
67 |
```bash
|
68 |
curl http://localhost:18888/v1/chat/completions \
|
@@ -73,7 +85,7 @@ curl http://localhost:18888/v1/chat/completions \
|
|
73 |
}'
|
74 |
```
|
75 |
|
76 |
-
Mathematical Reasoning Mode
|
77 |
|
78 |
```bash
|
79 |
curl http://localhost:18888/v1/chat/completions \
|
@@ -87,24 +99,22 @@ curl http://localhost:18888/v1/chat/completions \
|
|
87 |
|
88 |
</details>
|
89 |
|
90 |
-
| Model | Size | Context | Weights | Serving |
|
91 |
-
|-------------------|------|---------|------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------|
|
92 |
-
| OpenChat 3.5 1210 | 7B | 8192 | [Huggingface](https://huggingface.co/openchat/openchat_3.5_1210) | `python -m ochat.serving.openai_api_server --model openchat/openchat_3.5_1210 --engine-use-ray --worker-use-ray` |
|
93 |
-
|
94 |
### Conversation templates
|
95 |
|
96 |
-
Default Mode (GPT4 Correct)
|
97 |
|
98 |
```
|
99 |
GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant: Hi<|end_of_turn|>GPT4 Correct User: How are you today?<|end_of_turn|>GPT4 Correct Assistant:
|
100 |
```
|
101 |
|
102 |
-
Mathematical Reasoning Mode
|
103 |
|
104 |
```
|
105 |
Math Correct User: 10.3 โ 7988.8133=<|end_of_turn|>Math Correct Assistant:
|
106 |
```
|
107 |
|
|
|
|
|
108 |
The default (GPT4 Correct) template is also available as the integrated `tokenizer.chat_template`,
|
109 |
which can be used instead of manually specifying the template:
|
110 |
|
@@ -118,6 +128,38 @@ tokens = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
|
|
118 |
assert tokens == [1, 420, 6316, 28781, 3198, 3123, 1247, 28747, 22557, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747, 15359, 32000, 420, 6316, 28781, 3198, 3123, 1247, 28747, 1602, 460, 368, 3154, 28804, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747]
|
119 |
```
|
120 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
121 |
## Comparison with [X.AI Grok models](https://x.ai/)
|
122 |
|
123 |
| | License | # Param | Average | MMLU | HumanEval | MATH | GSM8k |
|
@@ -127,6 +169,8 @@ assert tokens == [1, 420, 6316, 28781, 3198, 3123, 1247, 28747, 22557, 32000, 42
|
|
127 |
| Grok-0 | Proprietary | 33B | 44.5 | 65.7 | 39.7 | 15.7 | 56.8 |
|
128 |
| Grok-1 | Proprietary | ???B | 55.8 | 73 | 63.2 | 23.9 | 62.9 |
|
129 |
|
|
|
|
|
130 |
## <a id="benchmarks"></a> Benchmarks
|
131 |
|
132 |
| Model | # Params | Average | MT-Bench | HumanEval | BBH MC | AGIEval | TruthfulQA | MMLU | GSM8K | BBH CoT |
|
@@ -175,6 +219,7 @@ OpenChat 3.5 was trained with C-RLFT on a collection of publicly available high-
|
|
175 |
|
176 |
- [OpenChat ShareGPT](https://huggingface.co/datasets/openchat/openchat_sharegpt4_dataset)
|
177 |
- [Open-Orca with FLAN answers](https://huggingface.co/datasets/imone/OpenOrca_FLAN)
|
|
|
178 |
- Capybara [1](https://huggingface.co/datasets/LDJnr/Pure-Dove) [2](https://huggingface.co/datasets/LDJnr/Verified-Camel) [3](https://huggingface.co/datasets/LDJnr/LessWrong-Amplify-Instruct)
|
179 |
- [GOAT](https://huggingface.co/datasets/tiedong/goat)
|
180 |
- [Glaive](https://huggingface.co/datasets/glaiveai/glaive-code-assistant)
|
|
|
6 |
- C-RLFT
|
7 |
datasets:
|
8 |
- openchat/openchat_sharegpt4_dataset
|
9 |
+
- kaist-ai/Feedback-Collection
|
10 |
- imone/OpenOrca_FLAN
|
11 |
- LDJnr/LessWrong-Amplify-Instruct
|
12 |
- LDJnr/Pure-Dove
|
|
|
20 |
pipeline_tag: text-generation
|
21 |
---
|
22 |
|
|
|
|
|
23 |
<div align="center">
|
24 |
<img src="https://raw.githubusercontent.com/imoneoi/openchat/master/assets/logo_new.png" style="width: 65%">
|
25 |
</div>
|
|
|
33 |
<a href="https://arxiv.org/pdf/2309.11235.pdf">Paper</a>
|
34 |
</p>
|
35 |
|
36 |
+
# OpenChat 3.5: First Update Released on December 10th!
|
37 |
+
|
38 |
+
**๐ 15-point improvement in coding performance**
|
39 |
+
|
40 |
+
**๐ก Introducing a coding & generalist mode and a mathematical reasoning mode**
|
41 |
|
42 |
+
**๐งโโ๏ธ Experimental support for evaluator and feedback capabilities**
|
|
|
|
|
|
|
|
|
|
|
43 |
|
44 |
+
**๐ค Outperforms Grok-1 in 3/4 and ChatGPT (March) in 5/8 benchmarks**
|
45 |
+
|
46 |
+
| Model | Size | HumanEval+ pass@1 |
|
47 |
+
|-----------------------------|----------|------------|
|
48 |
+
| ChatGPT (December 12, 2023) | - | 64.6 |
|
49 |
+
| WizardCoder-Python-34B-V1.0 | 34B | 64.6 |
|
50 |
+
| **OpenChat 3.5 (Dec 10)** | **7B** | **63.4** |
|
51 |
+
| OpenHermes 2.5 | 7B | 41.5 |
|
52 |
+
|
53 |
+
<div style="display: flex; justify-content: center; align-items: center">
|
54 |
+
<img src="https://raw.githubusercontent.com/imoneoi/openchat/master/assets/openchat.png" style="width: 45%;">
|
55 |
+
<img src="https://raw.githubusercontent.com/imoneoi/openchat/master/assets/openchat_grok.png" style="width: 45%;">
|
56 |
+
</div>
|
57 |
|
58 |
OpenChat is an innovative library of open-source language models, fine-tuned with [C-RLFT](https://arxiv.org/pdf/2309.11235.pdf) - a strategy inspired by offline reinforcement learning. Our models learn from mixed-quality data without preference labels, delivering exceptional performance on par with ChatGPT, even with a 7B model. Despite our simple approach, we are committed to developing a high-performance, commercially viable, open-source large language model, and we continue to make significant strides toward this vision.
|
59 |
|
|
|
67 |
|
68 |
If you want to deploy the server as an online service, you can use `--api-keys sk-KEY1 sk-KEY2 ...` to specify allowed API keys and `--disable-log-requests --disable-log-stats --log-file openchat.log` for logging only to a file. For security purposes, we recommend using an [HTTPS gateway](https://fastapi.tiangolo.com/es/deployment/concepts/#security-https) in front of the server.
|
69 |
|
70 |
+
| Model | Size | Context | Weights | Serving |
|
71 |
+
|-------------------|------|---------|------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------|
|
72 |
+
| OpenChat 3.5 1210 | 7B | 8192 | [Huggingface](https://huggingface.co/openchat/openchat_3.5_1210) | `python -m ochat.serving.openai_api_server --model openchat/openchat_3.5_1210 --engine-use-ray --worker-use-ray` |
|
73 |
+
|
74 |
<details>
|
75 |
<summary>Example request (click to expand)</summary>
|
76 |
|
77 |
+
๐ก **Default Mode (GPT4 Correct)**: Best for coding, chat and general tasks
|
78 |
|
79 |
```bash
|
80 |
curl http://localhost:18888/v1/chat/completions \
|
|
|
85 |
}'
|
86 |
```
|
87 |
|
88 |
+
๐งฎ **Mathematical Reasoning Mode**: Tailored for solving math problems
|
89 |
|
90 |
```bash
|
91 |
curl http://localhost:18888/v1/chat/completions \
|
|
|
99 |
|
100 |
</details>
|
101 |
|
|
|
|
|
|
|
|
|
102 |
### Conversation templates
|
103 |
|
104 |
+
๐ก **Default Mode (GPT4 Correct)**: Best for coding, chat and general tasks
|
105 |
|
106 |
```
|
107 |
GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant: Hi<|end_of_turn|>GPT4 Correct User: How are you today?<|end_of_turn|>GPT4 Correct Assistant:
|
108 |
```
|
109 |
|
110 |
+
๐งฎ **Mathematical Reasoning Mode**: Tailored for solving math problems
|
111 |
|
112 |
```
|
113 |
Math Correct User: 10.3 โ 7988.8133=<|end_of_turn|>Math Correct Assistant:
|
114 |
```
|
115 |
|
116 |
+
โ ๏ธ **Notice:** Remember to set `<|end_of_turn|>` as end of generation token.
|
117 |
+
|
118 |
The default (GPT4 Correct) template is also available as the integrated `tokenizer.chat_template`,
|
119 |
which can be used instead of manually specifying the template:
|
120 |
|
|
|
128 |
assert tokens == [1, 420, 6316, 28781, 3198, 3123, 1247, 28747, 22557, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747, 15359, 32000, 420, 6316, 28781, 3198, 3123, 1247, 28747, 1602, 460, 368, 3154, 28804, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747]
|
129 |
```
|
130 |
|
131 |
+
## ๐งโโ๏ธ (Experimental) Evaluator / Feedback Capabilities
|
132 |
+
|
133 |
+
We've included evaluator capabilities in this release to advance open-source models as evaluators. You can use `Default Mode (GPT4 Correct)` with the following prompt (same as [Prometheus](https://huggingface.co/datasets/kaist-ai/Feedback-Collection)) to evaluate a response.
|
134 |
+
|
135 |
+
```
|
136 |
+
###Task Description:
|
137 |
+
An instruction (might include an Input inside it), a response to evaluate, a reference answer that gets a score of 5, and a score rubric representing a evaluation criteria are given.
|
138 |
+
1. Write a detailed feedback that assess the quality of the response strictly based on the given score rubric, not evaluating in general.
|
139 |
+
2. After writing a feedback, write a score that is an integer between 1 and 5. You should refer to the score rubric.
|
140 |
+
3. The output format should look as follows: "Feedback: (write a feedback for criteria) [RESULT] (an integer number between 1 and 5)"
|
141 |
+
4. Please do not generate any other opening, closing, and explanations.
|
142 |
+
|
143 |
+
###The instruction to evaluate:
|
144 |
+
{orig_instruction}
|
145 |
+
|
146 |
+
###Response to evaluate:
|
147 |
+
{orig_response}
|
148 |
+
|
149 |
+
###Reference Answer (Score 5):
|
150 |
+
{orig_reference_answer}
|
151 |
+
|
152 |
+
###Score Rubrics:
|
153 |
+
[{orig_criteria}]
|
154 |
+
Score 1: {orig_score1_description}
|
155 |
+
Score 2: {orig_score2_description}
|
156 |
+
Score 3: {orig_score3_description}
|
157 |
+
Score 4: {orig_score4_description}
|
158 |
+
Score 5: {orig_score5_description}
|
159 |
+
|
160 |
+
###Feedback:
|
161 |
+
```
|
162 |
+
|
163 |
## Comparison with [X.AI Grok models](https://x.ai/)
|
164 |
|
165 |
| | License | # Param | Average | MMLU | HumanEval | MATH | GSM8k |
|
|
|
169 |
| Grok-0 | Proprietary | 33B | 44.5 | 65.7 | 39.7 | 15.7 | 56.8 |
|
170 |
| Grok-1 | Proprietary | ???B | 55.8 | 73 | 63.2 | 23.9 | 62.9 |
|
171 |
|
172 |
+
*: Grok results are reported by [X.AI](https://x.ai/).
|
173 |
+
|
174 |
## <a id="benchmarks"></a> Benchmarks
|
175 |
|
176 |
| Model | # Params | Average | MT-Bench | HumanEval | BBH MC | AGIEval | TruthfulQA | MMLU | GSM8K | BBH CoT |
|
|
|
219 |
|
220 |
- [OpenChat ShareGPT](https://huggingface.co/datasets/openchat/openchat_sharegpt4_dataset)
|
221 |
- [Open-Orca with FLAN answers](https://huggingface.co/datasets/imone/OpenOrca_FLAN)
|
222 |
+
- [Feedback-Collection](https://huggingface.co/datasets/kaist-ai/Feedback-Collection)
|
223 |
- Capybara [1](https://huggingface.co/datasets/LDJnr/Pure-Dove) [2](https://huggingface.co/datasets/LDJnr/Verified-Camel) [3](https://huggingface.co/datasets/LDJnr/LessWrong-Amplify-Instruct)
|
224 |
- [GOAT](https://huggingface.co/datasets/tiedong/goat)
|
225 |
- [Glaive](https://huggingface.co/datasets/glaiveai/glaive-code-assistant)
|