edgerunner-research's picture
Update README.md
ed12d50 verified
|
raw
history blame
17 kB
metadata
library_name: transformers
license: apache-2.0
language:
  - en

EdgeRunner-Tactical-7B

Introduction

EdgeRunner-Tactical-7B is a powerful and efficient language model for the edge. Our mission is to build Generative AI for the edge that is safe, secure, and transparent. To that end, the EdgeRunner team is proud to release EdgeRunner-Tactical-7B, the most powerful language model for its size to date.

EdgeRunner-Tactical-7B is a 7 billion parameter language model that delivers powerful performance while demonstrating the potential of running state-of-the-art (SOTA) models at the edge. It is the highest-scoring model in the 7B-XXB range, outperforming Gemini Pro, Mixtral-8x7B, and Meta-Llama-3-8B-Instruct. EdgeRunner-Tactical-7B also outperforms larger models, including GPT-4o mini and Mistral Large on the Arena Hard Benchmark.

Highlights

  • 7 billion parameters
  • SOTA performance for its size
  • Initialized from Qwen2-Instruct
  • Applied Self-Play Preference Optimization (SPPO) for continuous training on Qwen2-Instruct
  • Outperforms Mistral Large
  • Outperforms Mixtral-8x7B
  • Approaches Meta Llama-3-70B
  • Supports a context length of 128K tokens, making it ideal for tasks requiring many conversation turns or working with large amounts of text

Quickstart

Below is a code snippet to show you how to load the tokenizer and model, and how to generate contents.

from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained(
    "edgerunner-ai/EdgeRunner-Tactical-7B",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("edgerunner-ai/EdgeRunner-Tactical-7B")

prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Example Outputs

Create a Quantum Future:

Ask for a structured JSON output:

Evaluation

In this section, we report the results for EdgeRunner-Tactical-7B models on standard automatic benchmarks. Below are the results.

Arena-Hard Benchmark

Model Score 95% CI Avg #Tokens
gpt-4-turbo-2024-04-09 82.6 (-1.6, 2.1) 662
gpt-4-0125-preview 78.0 (-1.8, 2.1) 619
claude-3-opus-20240229 60.4 (-2.8, 2.6) 541
gpt-4-0314 50.0 (0.0, 0.0) 423
claude-3-haiku-20240307 41.5 (-2.5, 2.9) 505
llama-3-70b-chat-hf 41.1 (-2.7, 1.7) 583
EdgeRunner-Tactical-7B 38.2 (-2.3, 2.7) 719
gpt-4-0613 37.9 (-2.2, 2.6) 354
mistral-large-2402 37.7 (-1.9, 2.0) 400
mixtral-8x22b-instruct-v0.1 36.4 (-2.0, 2.0) 430
Qwen1.5-72B-Chat 36.1 (-2.3, 2.4) 474
command-r-plus 33.1 (-2.6, 2.0) 541
mistral-medium 31.9 (-2.1, 2.1) 485
gpt-3.5-turbo-0613 24.8 (-2.2, 1.7) 401
dbrx-instruct 24.6 (-2.0, 2.4) 415
Qwen2-7B-Instruct 23.5 (-1.9, 2.0) 605
Mixtral-8x7B-Instruct-v0.1 23.4 (-1.9, 1.9) 457
gpt-3.5-turbo-0125 23.3 (-1.9, 2.0) 329

InfiniteBench

Task Name GPT-4 YaRN-Mistral-7B Kimi-Chat Claude 2 Yi-6B-200K Yi-34B-200K Chatglm3-6B-128K EdgeRunner-Tactical-7B Qwen2-7B-Instruct
Retrieve.PassKey 100% 92.71% 98.14% 97.80% 100.00% 100.00% 92.20% 100% 100%
Retrieve.Number 100% 56.61% 95.42% 98.14% 94.92% 100.00% 80.68% 100% 99.83%
Retrieve.KV 89.00% < 5% 53.60% 65.40% < 5% < 5% < 5% 2.2% 1.8%
En.Sum 14.73% 9.09% 17.96% 14.50% < 5% < 5% < 5% 33.07% 29.13%
En.QA 22.44% 9.55% 16.52% 11.97% 9.20% 12.17% < 5% 3.4% 9.09%
En.MC 67.25% 27.95% 72.49% 62.88% 36.68% 38.43% 10.48% 66.81% 66.37%
En.Dia 8.50% 7.50% 11.50% 46.50% < 5% < 5% < 5% 29% 17%
Zh.QA 25.96% 16.98% 17.93% 9.64% 15.07% 13.61% < 5% 4.6% 11.14%
Code.Debug 37.06% < 5% 17.77% < 5% 9.14% 13.96% 7.36% 22.08% 24.61%
Code.Run 23.25% < 5% < 5% < 5% < 5% < 5% < 5% 0% 0.5%
Math.Calc < 5% < 5% < 5% < 5% < 5% < 5% < 5% 0% 0%
Math.Find 60.00% 17.14% 12.57% 32.29% < 5% 25.71% 7.71% 29.14% 31.42%

GSM@ZeroEval

Model Acc No Answer Reason Lens
Llama-3.1-405B-Instruct-Turbo 95.91 0.08 365.07
claude-3-5-sonnet-20240620 95.6 0 465.19
claude-3-opus-20240229 95.6 0 410.62
gpt-4o-2024-05-13 95.38 0 479.98
gpt-4o-mini-2024-07-18 94.24 0 463.71
deepseek-chat 93.93 0 495.52
deepseek-coder 93.78 0 566.89
gemini-1.5-pro 93.4 0 389.17
Meta-Llama-3-70B-Instruct 93.03 0 352.05
Qwen2-72B-Instruct 92.65 0 375.96
claude-3-sonnet-20240229 91.51 0 762.69
gemini-1.5-flash 91.36 0 344.61
gemma-2-27b-it@together 90.22 0 364.68
claude-3-haiku-20240307 88.78 0 587.65
gemma-2-9b-it 87.41 0 394.83
reka-core-20240501 87.41 0.08 414.7
Athene-70B 86.66 0.3 253.53
Yi-1.5-34B-Chat 84.08 0.08 553.47
Llama-3.1-8B-Instruct 82.87 0.45 414.19
Mistral-Nemo-Instruct-2407 82.79 0 349.81
yi-large-preview 82.64 0 514.25
EdgeRunner-Tactical-7B 81.12 0.08 615.89
gpt-3.5-turbo-0125 80.36 0 350.97
command-r-plus 80.14 0.08 294.08
Qwen2-7B-Instruct 80.06 0 452.6
yi-large 80.06 0 479.87
Meta-Llama-3-8B-Instruct 78.47 0 429.39
Yi-1.5-9B-Chat 76.42 0.08 485.39
Phi-3-mini-4k-instruct 75.51 0 462.53
reka-flash-20240226 74.68 0.45 460.06
Meta-Llama-3.1-8B-Instruct 72.33 0.38 483.41
Mixtral-8x7B-Instruct-v0.1 70.13 2.27 361.12
Llama-3-Instruct-8B-SimPO-v0.2 57.54 2.05 505.25
command-r 52.99 0 294.43
Qwen2-1.5B-Instruct 43.37 4.78 301.67

MMLU-REDUX@ZeroEval

Model Acc No answer Reason Lens
gpt-4o-2024-05-13 88.01 0.14 629.79
claude-3-5-sonnet-20240620 86 0.18 907.1
Llama-3.1-405B-Instruct-Turbo 85.64 0.76 449.71
gpt-4-turbo-2024-04-09 85.31 0.04 631.38
gemini-1.5-pro 82.76 1.94 666.7
claude-3-opus-20240229 82.54 0.58 500.35
yi-large-preview 82.15 0.14 982.6
gpt-4-0314 81.64 0.04 397.22
Qwen2-72B-Instruct 81.61 0.29 486.41
gpt-4o-mini-2024-07-18 81.5 0.07 526
yi-large 81.17 0 774.85
deepseek-chat 80.81 0.11 691.91
deepseek-coder 79.63 0.14 704.72
Meta-Llama-3-70B-Instruct 78.01 0.11 520.77
gemini-1.5-flash 77.36 1.26 583.45
Athene-70B 76.64 0.04 552.61
reka-core-20240501 76.42 0.76 701.67
gemma-2-27b-it@together 75.67 0.61 446.51
claude-3-sonnet-20240229 74.87 0.07 671.75
gemma-2-9b-it@nvidia 72.82 0.76 499
Yi-1.5-34B-Chat 72.79 1.01 620.1
claude-3-haiku-20240307 72.32 0.04 644.59
Phi-3-mini-4k-instruct 70.34 0.43 677.09
command-r-plus 68.61 0 401.51
gpt-3.5-turbo-0125 68.36 0.04 357.92
EdgeRunner-Tactical-7B 67.71 0.65 917.6
Llama-3.1-8B-Instruct 67.13 3.38 399.54
Qwen2-7B-Instruct 66.92 0.72 533.15
Mistral-Nemo-Instruct-2407 66.88 0.47 464.19
Yi-1.5-9B-Chat 65.05 4.61 542.87
Meta-Llama-3.1-8B-Instruct 64.79 1.94 463.76
reka-flash-20240226 64.72 0.32 659.25
Mixtral-8x7B-Instruct-v0.1 63.17 5.51 324.31
Meta-Llama-3-8B-Instruct 61.66 0.97 600.81
command-r 61.12 0.04 382.23
Llama-3-Instruct-8B-SimPO-v0.2 55.22 1.19 450.6
Qwen2-1.5B-Instruct 41.11 7.74 280.56

WildBench

Model WB_Elo RewardScore_Avg task_macro_reward.K=-1 Length
gpt-4o-2024-05-13 1248.12 50.05 40.80 3723.52
claude-3-5-sonnet-20240620 1229.76 46.16 37.63 2911.85
gpt-4-turbo-2024-04-09 1225.29 46.19 37.17 3093.17
gpt-4-0125-preview 1211.44 41.24 30.20 3335.64
gemini-1.5-pro 1209.23 45.27 37.59 3247.97
yi-large-preview 1209.00 46.92 38.54 3512.68
claude-3-opus-20240229 1206.56 37.03 22.35 2685.98
Meta-Llama-3-70B-Instruct 1197.72 35.15 22.54 3046.64
Athene-70B 1197.41 29.77 0.00 3175.14
deepseek-coder-v2 1194.11 29.39 11.38 2795.31
gpt-4o-mini-2024-07-18 1192.43 28.57 0.00 3648.13
yi-large 1191.88 33.35 17.77 3095.34
gemini-1.5-flash 1190.30 37.45 26.04 3654.40
deepseek-v2-chat-0628 1188.07 27.00 0.00 3252.38
gemma-2-9b-it-SimPO 1184.67 26.64 0.00 4277.67
gemma-2-9b-it-DPO 1182.43 26.61 0.00 3982.63
nemotron-4-340b-instruct 1181.77 33.76 19.85 2754.01
claude-3-sonnet-20240229 1179.81 28.09 10.70 2670.24
deepseekv2-chat 1178.76 30.41 12.60 2896.97
gemma-2-27b-it@together 1178.34 24.27 0.00 2924.55
Qwen2-72B-Instruct 1176.75 24.77 5.03 2856.45
reka-core-20240501 1173.85 31.48 17.06 2592.59
Mistral-Nemo-Instruct-2407 1165.29 22.19 0.00 3318.21
Yi-1.5-34B-Chat 1163.69 30.83 16.06 3523.56
EdgeRunner-Tactical-7B 1162.88 22.26 0.00 3754.66
claude-3-haiku-20240307 1160.56 16.30 -6.30 2601.03
mistral-large-2402 1159.72 13.27 -12.36 2514.98
deepseek-v2-coder-0628 1155.97 22.83 0.00 2580.18
gemma-2-9b-it 1154.30 21.35 0.00 2802.89
Llama-3-8B-Magpie-Align-v0.1 1154.13 28.72 18.14 3107.77
command-r-plus 1153.15 16.58 -3.60 3293.81
glm-4-9b-chat 1152.68 20.71 2.33 3692.04
Qwen1.5-72B-Chat-greedy 1151.97 20.83 1.72 2392.36
Yi-1.5-9B-Chat 1151.43 21.80 4.93 3468.23
Llama-3-Instruct-8B-SimPO 1151.38 23.31 9.57 2541.93
Llama-3-Instruct-8B-SimPO-v0.2 1150.81 18.58 0.00 2533.76
SELM-Llama-3-8B-Instruct-iter-3 1148.03 17.89 0.53 2913.15
Llama-3-Instruct-8B-SimPO-ExPO 1147.24 21.39 7.77 2480.65
Meta-Llama-3-8B-Instruct 1140.76 6.72 -15.76 2975.19
Qwen2-7B-Instruct 1137.66 16.20 0.00 3216.43
Starling-LM-7B-beta-ExPO 1137.58 11.28 -9.01 2835.83
Hermes-2-Theta-Llama-3-8B 1135.99 3.18 -23.28 2742.17
Llama-3.1-8B-Instruct 1135.42 16.38 0.00 3750.60