edgerunner-research commited on
Commit
ed12d50
·
verified ·
1 Parent(s): 59e75cb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +248 -6
README.md CHANGED
@@ -1,9 +1,251 @@
1
- ### MT-Bench
2
- - Score: 8.55
 
 
 
 
3
 
4
- ### Arena hard
5
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/668ed3dcd857a9ca47edb75c/kFiab1FT9LW7CfzFHPNO4.png)
6
 
7
- ### Alpaca Bench 2
8
 
9
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/668ed3dcd857a9ca47edb75c/8YXOS5evbCv96FccidQ99.png)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ language:
5
+ - en
6
+ ---
7
 
8
+ # EdgeRunner-Tactical-7B
 
9
 
10
+ ## Introduction
11
 
12
+ EdgeRunner-Tactical-7B is a powerful and efficient language model for the edge. Our mission is to build Generative AI for the edge that is safe, secure, and transparent. To that end, the EdgeRunner team is proud to release EdgeRunner-Tactical-7B, the most powerful language model for its size to date.
13
+
14
+ EdgeRunner-Tactical-7B is a 7 billion parameter language model that delivers powerful performance while demonstrating the potential of running state-of-the-art (SOTA) models at the edge. It is the highest-scoring model in the 7B-XXB range, outperforming Gemini Pro, Mixtral-8x7B, and Meta-Llama-3-8B-Instruct. EdgeRunner-Tactical-7B also outperforms larger models, including GPT-4o mini and Mistral Large on the Arena Hard Benchmark.
15
+
16
+ ## Highlights
17
+
18
+ - 7 billion parameters
19
+ - SOTA performance for its size
20
+ - Initialized from Qwen2-Instruct
21
+ - Applied Self-Play Preference Optimization ([SPPO](https://arxiv.org/abs/2405.00675)) for continuous training on Qwen2-Instruct
22
+ - Outperforms Mistral Large
23
+ - Outperforms Mixtral-8x7B
24
+ - Approaches Meta Llama-3-70B
25
+ - Supports a context length of 128K tokens, making it ideal for tasks requiring many conversation turns or working with large amounts of text
26
+
27
+
28
+ ## Quickstart
29
+
30
+ Below is a code snippet to show you how to load the tokenizer and model, and how to generate contents.
31
+
32
+ ```python
33
+ from transformers import AutoModelForCausalLM, AutoTokenizer
34
+ device = "cuda" # the device to load the model onto
35
+
36
+ model = AutoModelForCausalLM.from_pretrained(
37
+ "edgerunner-ai/EdgeRunner-Tactical-7B",
38
+ torch_dtype="auto",
39
+ device_map="auto"
40
+ )
41
+ tokenizer = AutoTokenizer.from_pretrained("edgerunner-ai/EdgeRunner-Tactical-7B")
42
+
43
+ prompt = "Give me a short introduction to large language model."
44
+ messages = [
45
+ {"role": "system", "content": "You are a helpful assistant."},
46
+ {"role": "user", "content": prompt}
47
+ ]
48
+ text = tokenizer.apply_chat_template(
49
+ messages,
50
+ tokenize=False,
51
+ add_generation_prompt=True
52
+ )
53
+ model_inputs = tokenizer([text], return_tensors="pt").to(device)
54
+
55
+ generated_ids = model.generate(
56
+ model_inputs.input_ids,
57
+ max_new_tokens=512
58
+ )
59
+ generated_ids = [
60
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
61
+ ]
62
+
63
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
64
+ ```
65
+
66
+ ## Example Outputs
67
+
68
+ ### Create a Quantum Future:
69
+
70
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/633fe629f81b9d10135fefda/3b00jTWhIV_5OWxtW6zFI.png" width="95%">
71
+
72
+ ### Ask for a structured JSON output:
73
+
74
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/633fe629f81b9d10135fefda/CzW5qUh9tAkZV8k8Xs4nm.png" width="95%">
75
+
76
+
77
+ ## Evaluation
78
+
79
+ In this section, we report the results for EdgeRunner-Tactical-7B models on standard automatic benchmarks. Below are the results.
80
+
81
+ ### Arena-Hard Benchmark
82
+
83
+ | Model | Score | 95% CI | Avg #Tokens |
84
+ | :----------------------------- | :----: | :------: | :---------: |
85
+ | gpt-4-turbo-2024-04-09 | 82.6 | (-1.6, 2.1) | 662 |
86
+ | gpt-4-0125-preview | 78.0 | (-1.8, 2.1) | 619 |
87
+ | claude-3-opus-20240229 | 60.4 | (-2.8, 2.6) | 541 |
88
+ | gpt-4-0314 | 50.0 | (0.0, 0.0) | 423 |
89
+ | claude-3-haiku-20240307 | 41.5 | (-2.5, 2.9) | 505 |
90
+ | llama-3-70b-chat-hf | 41.1 | (-2.7, 1.7) | 583 |
91
+ | EdgeRunner-Tactical-7B | 38.2 | (-2.3, 2.7) | 719 |
92
+ | gpt-4-0613 | 37.9 | (-2.2, 2.6) | 354 |
93
+ | mistral-large-2402 | 37.7 | (-1.9, 2.0) | 400 |
94
+ | mixtral-8x22b-instruct-v0.1 | 36.4 | (-2.0, 2.0) | 430 |
95
+ | Qwen1.5-72B-Chat | 36.1 | (-2.3, 2.4) | 474 |
96
+ | command-r-plus | 33.1 | (-2.6, 2.0) | 541 |
97
+ | mistral-medium | 31.9 | (-2.1, 2.1) | 485 |
98
+ | gpt-3.5-turbo-0613 | 24.8 | (-2.2, 1.7) | 401 |
99
+ | dbrx-instruct | 24.6 | (-2.0, 2.4) | 415 |
100
+ | Qwen2-7B-Instruct | 23.5 | (-1.9, 2.0) | 605 |
101
+ | Mixtral-8x7B-Instruct-v0.1 | 23.4 | (-1.9, 1.9) | 457 |
102
+ | gpt-3.5-turbo-0125 | 23.3 | (-1.9, 2.0) | 329 |
103
+
104
+
105
+ ### InfiniteBench
106
+
107
+ | Task Name | GPT-4 | YaRN-Mistral-7B | Kimi-Chat | Claude 2 | Yi-6B-200K | Yi-34B-200K | Chatglm3-6B-128K | EdgeRunner-Tactical-7B | Qwen2-7B-Instruct |
108
+ |-----------------|-------|------------------|-----------|----------|------------|-------------|------------------|------------------------|-------------------|
109
+ | Retrieve.PassKey| 100% | 92.71% | 98.14% | 97.80% | 100.00% | 100.00% | 92.20% | 100% | 100% |
110
+ | Retrieve.Number | 100% | 56.61% | 95.42% | 98.14% | 94.92% | 100.00% | 80.68% | 100% | 99.83% |
111
+ | Retrieve.KV | 89.00%| < 5% | 53.60% | 65.40% | < 5% | < 5% | < 5% | 2.2% | 1.8% |
112
+ | En.Sum | 14.73%| 9.09% | 17.96% | 14.50% | < 5% | < 5% | < 5% | 33.07% | 29.13% |
113
+ | En.QA | 22.44%| 9.55% | 16.52% | 11.97% | 9.20% | 12.17% | < 5% | 3.4% | 9.09% |
114
+ | En.MC | 67.25%| 27.95% | 72.49% | 62.88% | 36.68% | 38.43% | 10.48% | 66.81% | 66.37% |
115
+ | En.Dia | 8.50% | 7.50% | 11.50% | 46.50% | < 5% | < 5% | < 5% | 29% | 17% |
116
+ | Zh.QA | 25.96%| 16.98% | 17.93% | 9.64% | 15.07% | 13.61% | < 5% | 4.6% | 11.14% |
117
+ | Code.Debug | 37.06%| < 5% | 17.77% | < 5% | 9.14% | 13.96% | 7.36% | 22.08% | 24.61% |
118
+ | Code.Run | 23.25%| < 5% | < 5% | < 5% | < 5% | < 5% | < 5% | 0% | 0.5% |
119
+ | Math.Calc | < 5% | < 5% | < 5% | < 5% | < 5% | < 5% | < 5% | 0% | 0% |
120
+ | Math.Find | 60.00%| 17.14% | 12.57% | 32.29% | < 5% | 25.71% | 7.71% | 29.14% | 31.42% |
121
+
122
+ ### GSM@ZeroEval
123
+
124
+ | Model | Acc | No Answer | Reason Lens |
125
+ |-------------------------------------|--------|-----------|-------------|
126
+ | Llama-3.1-405B-Instruct-Turbo | 95.91 | 0.08 | 365.07 |
127
+ | claude-3-5-sonnet-20240620 | 95.6 | 0 | 465.19 |
128
+ | claude-3-opus-20240229 | 95.6 | 0 | 410.62 |
129
+ | gpt-4o-2024-05-13 | 95.38 | 0 | 479.98 |
130
+ | gpt-4o-mini-2024-07-18 | 94.24 | 0 | 463.71 |
131
+ | deepseek-chat | 93.93 | 0 | 495.52 |
132
+ | deepseek-coder | 93.78 | 0 | 566.89 |
133
+ | gemini-1.5-pro | 93.4 | 0 | 389.17 |
134
+ | Meta-Llama-3-70B-Instruct | 93.03 | 0 | 352.05 |
135
+ | Qwen2-72B-Instruct | 92.65 | 0 | 375.96 |
136
+ | claude-3-sonnet-20240229 | 91.51 | 0 | 762.69 |
137
+ | gemini-1.5-flash | 91.36 | 0 | 344.61 |
138
+ | gemma-2-27b-it@together | 90.22 | 0 | 364.68 |
139
+ | claude-3-haiku-20240307 | 88.78 | 0 | 587.65 |
140
+ | gemma-2-9b-it | 87.41 | 0 | 394.83 |
141
+ | reka-core-20240501 | 87.41 | 0.08 | 414.7 |
142
+ | Athene-70B | 86.66 | 0.3 | 253.53 |
143
+ | Yi-1.5-34B-Chat | 84.08 | 0.08 | 553.47 |
144
+ | Llama-3.1-8B-Instruct | 82.87 | 0.45 | 414.19 |
145
+ | Mistral-Nemo-Instruct-2407 | 82.79 | 0 | 349.81 |
146
+ | yi-large-preview | 82.64 | 0 | 514.25 |
147
+ | EdgeRunner-Tactical-7B | 81.12 | 0.08 | 615.89 |
148
+ | gpt-3.5-turbo-0125 | 80.36 | 0 | 350.97 |
149
+ | command-r-plus | 80.14 | 0.08 | 294.08 |
150
+ | Qwen2-7B-Instruct | 80.06 | 0 | 452.6 |
151
+ | yi-large | 80.06 | 0 | 479.87 |
152
+ | Meta-Llama-3-8B-Instruct | 78.47 | 0 | 429.39 |
153
+ | Yi-1.5-9B-Chat | 76.42 | 0.08 | 485.39 |
154
+ | Phi-3-mini-4k-instruct | 75.51 | 0 | 462.53 |
155
+ | reka-flash-20240226 | 74.68 | 0.45 | 460.06 |
156
+ | Meta-Llama-3.1-8B-Instruct | 72.33 | 0.38 | 483.41 |
157
+ | Mixtral-8x7B-Instruct-v0.1 | 70.13 | 2.27 | 361.12 |
158
+ | Llama-3-Instruct-8B-SimPO-v0.2 | 57.54 | 2.05 | 505.25 |
159
+ | command-r | 52.99 | 0 | 294.43 |
160
+ | Qwen2-1.5B-Instruct | 43.37 | 4.78 | 301.67 |
161
+
162
+
163
+ ### MMLU-REDUX@ZeroEval
164
+
165
+ | Model | Acc | No answer | Reason Lens |
166
+ |------------------------------------|-------|-----------|-------------|
167
+ | gpt-4o-2024-05-13 | 88.01 | 0.14 | 629.79 |
168
+ | claude-3-5-sonnet-20240620 | 86 | 0.18 | 907.1 |
169
+ | Llama-3.1-405B-Instruct-Turbo | 85.64 | 0.76 | 449.71 |
170
+ | gpt-4-turbo-2024-04-09 | 85.31 | 0.04 | 631.38 |
171
+ | gemini-1.5-pro | 82.76 | 1.94 | 666.7 |
172
+ | claude-3-opus-20240229 | 82.54 | 0.58 | 500.35 |
173
+ | yi-large-preview | 82.15 | 0.14 | 982.6 |
174
+ | gpt-4-0314 | 81.64 | 0.04 | 397.22 |
175
+ | Qwen2-72B-Instruct | 81.61 | 0.29 | 486.41 |
176
+ | gpt-4o-mini-2024-07-18 | 81.5 | 0.07 | 526 |
177
+ | yi-large | 81.17 | 0 | 774.85 |
178
+ | deepseek-chat | 80.81 | 0.11 | 691.91 |
179
+ | deepseek-coder | 79.63 | 0.14 | 704.72 |
180
+ | Meta-Llama-3-70B-Instruct | 78.01 | 0.11 | 520.77 |
181
+ | gemini-1.5-flash | 77.36 | 1.26 | 583.45 |
182
+ | Athene-70B | 76.64 | 0.04 | 552.61 |
183
+ | reka-core-20240501 | 76.42 | 0.76 | 701.67 |
184
+ | gemma-2-27b-it@together | 75.67 | 0.61 | 446.51 |
185
+ | claude-3-sonnet-20240229 | 74.87 | 0.07 | 671.75 |
186
+ | gemma-2-9b-it@nvidia | 72.82 | 0.76 | 499 |
187
+ | Yi-1.5-34B-Chat | 72.79 | 1.01 | 620.1 |
188
+ | claude-3-haiku-20240307 | 72.32 | 0.04 | 644.59 |
189
+ | Phi-3-mini-4k-instruct | 70.34 | 0.43 | 677.09 |
190
+ | command-r-plus | 68.61 | 0 | 401.51 |
191
+ | gpt-3.5-turbo-0125 | 68.36 | 0.04 | 357.92 |
192
+ | EdgeRunner-Tactical-7B | 67.71 | 0.65 | 917.6 |
193
+ | Llama-3.1-8B-Instruct | 67.13 | 3.38 | 399.54 |
194
+ | Qwen2-7B-Instruct | 66.92 | 0.72 | 533.15 |
195
+ | Mistral-Nemo-Instruct-2407 | 66.88 | 0.47 | 464.19 |
196
+ | Yi-1.5-9B-Chat | 65.05 | 4.61 | 542.87 |
197
+ | Meta-Llama-3.1-8B-Instruct | 64.79 | 1.94 | 463.76 |
198
+ | reka-flash-20240226 | 64.72 | 0.32 | 659.25 |
199
+ | Mixtral-8x7B-Instruct-v0.1 | 63.17 | 5.51 | 324.31 |
200
+ | Meta-Llama-3-8B-Instruct | 61.66 | 0.97 | 600.81 |
201
+ | command-r | 61.12 | 0.04 | 382.23 |
202
+ | Llama-3-Instruct-8B-SimPO-v0.2 | 55.22 | 1.19 | 450.6 |
203
+ | Qwen2-1.5B-Instruct | 41.11 | 7.74 | 280.56 |
204
+
205
+ ### WildBench
206
+
207
+ | Model | WB_Elo | RewardScore_Avg | task_macro_reward.K=-1 | Length |
208
+ |-------------------------------------|---------|-----------------|------------------------|----------|
209
+ | gpt-4o-2024-05-13 | 1248.12 | 50.05 | 40.80 | 3723.52 |
210
+ | claude-3-5-sonnet-20240620 | 1229.76 | 46.16 | 37.63 | 2911.85 |
211
+ | gpt-4-turbo-2024-04-09 | 1225.29 | 46.19 | 37.17 | 3093.17 |
212
+ | gpt-4-0125-preview | 1211.44 | 41.24 | 30.20 | 3335.64 |
213
+ | gemini-1.5-pro | 1209.23 | 45.27 | 37.59 | 3247.97 |
214
+ | yi-large-preview | 1209.00 | 46.92 | 38.54 | 3512.68 |
215
+ | claude-3-opus-20240229 | 1206.56 | 37.03 | 22.35 | 2685.98 |
216
+ | Meta-Llama-3-70B-Instruct | 1197.72 | 35.15 | 22.54 | 3046.64 |
217
+ | Athene-70B | 1197.41 | 29.77 | 0.00 | 3175.14 |
218
+ | deepseek-coder-v2 | 1194.11 | 29.39 | 11.38 | 2795.31 |
219
+ | gpt-4o-mini-2024-07-18 | 1192.43 | 28.57 | 0.00 | 3648.13 |
220
+ | yi-large | 1191.88 | 33.35 | 17.77 | 3095.34 |
221
+ | gemini-1.5-flash | 1190.30 | 37.45 | 26.04 | 3654.40 |
222
+ | deepseek-v2-chat-0628 | 1188.07 | 27.00 | 0.00 | 3252.38 |
223
+ | gemma-2-9b-it-SimPO | 1184.67 | 26.64 | 0.00 | 4277.67 |
224
+ | gemma-2-9b-it-DPO | 1182.43 | 26.61 | 0.00 | 3982.63 |
225
+ | nemotron-4-340b-instruct | 1181.77 | 33.76 | 19.85 | 2754.01 |
226
+ | claude-3-sonnet-20240229 | 1179.81 | 28.09 | 10.70 | 2670.24 |
227
+ | deepseekv2-chat | 1178.76 | 30.41 | 12.60 | 2896.97 |
228
+ | gemma-2-27b-it@together | 1178.34 | 24.27 | 0.00 | 2924.55 |
229
+ | Qwen2-72B-Instruct | 1176.75 | 24.77 | 5.03 | 2856.45 |
230
+ | reka-core-20240501 | 1173.85 | 31.48 | 17.06 | 2592.59 |
231
+ | Mistral-Nemo-Instruct-2407 | 1165.29 | 22.19 | 0.00 | 3318.21 |
232
+ | Yi-1.5-34B-Chat | 1163.69 | 30.83 | 16.06 | 3523.56 |
233
+ | EdgeRunner-Tactical-7B | 1162.88 | 22.26 | 0.00 | 3754.66 |
234
+ | claude-3-haiku-20240307 | 1160.56 | 16.30 | -6.30 | 2601.03 |
235
+ | mistral-large-2402 | 1159.72 | 13.27 | -12.36 | 2514.98 |
236
+ | deepseek-v2-coder-0628 | 1155.97 | 22.83 | 0.00 | 2580.18 |
237
+ | gemma-2-9b-it | 1154.30 | 21.35 | 0.00 | 2802.89 |
238
+ | Llama-3-8B-Magpie-Align-v0.1 | 1154.13 | 28.72 | 18.14 | 3107.77 |
239
+ | command-r-plus | 1153.15 | 16.58 | -3.60 | 3293.81 |
240
+ | glm-4-9b-chat | 1152.68 | 20.71 | 2.33 | 3692.04 |
241
+ | Qwen1.5-72B-Chat-greedy | 1151.97 | 20.83 | 1.72 | 2392.36 |
242
+ | Yi-1.5-9B-Chat | 1151.43 | 21.80 | 4.93 | 3468.23 |
243
+ | Llama-3-Instruct-8B-SimPO | 1151.38 | 23.31 | 9.57 | 2541.93 |
244
+ | Llama-3-Instruct-8B-SimPO-v0.2 | 1150.81 | 18.58 | 0.00 | 2533.76 |
245
+ | SELM-Llama-3-8B-Instruct-iter-3 | 1148.03 | 17.89 | 0.53 | 2913.15 |
246
+ | Llama-3-Instruct-8B-SimPO-ExPO | 1147.24 | 21.39 | 7.77 | 2480.65 |
247
+ | Meta-Llama-3-8B-Instruct | 1140.76 | 6.72 | -15.76 | 2975.19 |
248
+ | Qwen2-7B-Instruct | 1137.66 | 16.20 | 0.00 | 3216.43 |
249
+ | Starling-LM-7B-beta-ExPO | 1137.58 | 11.28 | -9.01 | 2835.83 |
250
+ | Hermes-2-Theta-Llama-3-8B | 1135.99 | 3.18 | -23.28 | 2742.17 |
251
+ | Llama-3.1-8B-Instruct | 1135.42 | 16.38 | 0.00 | 3750.60 |