tenyx
/

Llama3-TenyxChat-70B

Text Generation

tenyx-fine-tuning

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Romain-Cosentino commited on Apr 30, 2024

Commit

d37e3d2

·

verified ·

1 Parent(s): cb2002c

Update README.md

Files changed (1) hide show

README.md +13 -12

README.md CHANGED Viewed

@@ -87,18 +87,19 @@ MT-Bench is a benchmark made up of 80 high-quality multi-turn questions. These q
 Arena-Hard is an evaluation tool for instruction-tuned LLMs containing 500 challenging user queries. They prompt GPT-4-1106-preview as judge to compare the models' responses against a baseline model (default: GPT-4-0314).
-| Model-name                     | Score  |
-| gpt-4-0125-preview             |  78.0  | 95% CI: (-1.8, 2.2)
-| claude-3-opus-20240229         |  60.4  | 95% CI: (-2.6, 2.1)
-| gpt-4-0314                     |  50.0  | 95% CI:  (0.0, 0.0)
-| **tenyx/Llama3-TenyxChat-70B** |  49.0  | 95% CI: (-3.0, 2.4)
-| meta-llama/Meta-Llama-3-70B-In |  47.3  | 95% CI: (-1.7, 2.6)
-| claude-3-sonnet-20240229       |  46.8  | 95% CI: (-2.7, 2.3)
-| claude-3-haiku-20240307        |  41.5  | 95% CI: (-2.4, 2.5)
-| gpt-4-0613                     |  37.9  | 95% CI: (-2.1, 2.2)
-| mistral-large-2402             |  37.7  | 95% CI: (-2.9, 2.8)
-| Qwen1.5-72B-Chat               |  36.1  | 95% CI: (-2.1, 2.4)
-| command-r-plus                 |  33.1  | 95% CI: (-2.0, 1.9)
 # Limitations

 Arena-Hard is an evaluation tool for instruction-tuned LLMs containing 500 challenging user queries. They prompt GPT-4-1106-preview as judge to compare the models' responses against a baseline model (default: GPT-4-0314).
+| Model-name                     | Score  |                     |
+|--------------------------------|--------|---------------------|
+| gpt-4-0125-preview             |  78.0  | 95% CI: (-1.8, 2.2) |
+| claude-3-opus-20240229         |  60.4  | 95% CI: (-2.6, 2.1) |
+| gpt-4-0314                     |  50.0  | 95% CI:  (0.0, 0.0) |
+| **tenyx/Llama3-TenyxChat-70B** |  49.0  | 95% CI: (-3.0, 2.4) |
+| meta-llama/Meta-Llama-3-70B-In |  47.3  | 95% CI: (-1.7, 2.6) |
+| claude-3-sonnet-20240229       |  46.8  | 95% CI: (-2.7, 2.3) |
+| claude-3-haiku-20240307        |  41.5  | 95% CI: (-2.4, 2.5) |
+| gpt-4-0613                     |  37.9  | 95% CI: (-2.1, 2.2) |
+| mistral-large-2402             |  37.7  | 95% CI: (-2.9, 2.8) |
+| Qwen1.5-72B-Chat               |  36.1  | 95% CI: (-2.1, 2.4) |
+| command-r-plus                 |  33.1  | 95% CI: (-2.0, 1.9) |
 # Limitations