Update README.md
Browse files
README.md
CHANGED
|
@@ -49,18 +49,99 @@ Zephyr is a series of language models that are trained to act as helpful assista
|
|
| 49 |
|
| 50 |
## Performance
|
| 51 |
|
| 52 |
-
| Model |MT Bench
|
| 53 |
|-----------------------------------------------------------------------|------:|------:|
|
| 54 |
|[zephyr-7b-gemma](https://huggingface.co/HuggingFaceH4/zephyr-7b-gemma)| 7.81 | 28.76|
|
| 55 |
|[zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) | 7.34 | 43.81|
|
| 56 |
-
|[gemma-7b-it](https://huggingface.co/google/gemma-7b-it) | 6.38 | 38.01|
|
| 57 |
|
| 58 |
|
| 59 |
-
|
|
|
|
| 60 |
|-----------------------------------------------------------------------|------:|------:|---------:|-------:|------:|
|
| 61 |
-
|[zephyr-7b-gemma](https://huggingface.co/HuggingFaceH4/zephyr-7b-gemma)| 34.22| 66.37| 52.19| 37.10| 47.47|
|
| 62 |
|[zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) | 37.52| 71.77| 55.26| 39.77| 51.08|
|
| 63 |
-
|[
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 64 |
|
| 65 |
## Intended uses & limitations
|
| 66 |
|
|
@@ -70,8 +151,7 @@ We then further aligned the model with [🤗 TRL's](https://github.com/huggingfa
|
|
| 70 |
Here's how you can run the model using the `pipeline()` function from 🤗 Transformers:
|
| 71 |
|
| 72 |
```python
|
| 73 |
-
#
|
| 74 |
-
# pip install git+https://github.com/huggingface/transformers.git
|
| 75 |
# pip install accelerate
|
| 76 |
|
| 77 |
import torch
|
|
|
|
| 49 |
|
| 50 |
## Performance
|
| 51 |
|
| 52 |
+
| Model |MT Bench⬇️|IFEval|
|
| 53 |
|-----------------------------------------------------------------------|------:|------:|
|
| 54 |
|[zephyr-7b-gemma](https://huggingface.co/HuggingFaceH4/zephyr-7b-gemma)| 7.81 | 28.76|
|
| 55 |
|[zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) | 7.34 | 43.81|
|
| 56 |
+
|[google/gemma-7b-it](https://huggingface.co/google/gemma-7b-it) | 6.38 | 38.01|
|
| 57 |
|
| 58 |
|
| 59 |
+
|
| 60 |
+
| Model |AGIEval|GPT4All|TruthfulQA|BigBench|Average ⬇️|
|
| 61 |
|-----------------------------------------------------------------------|------:|------:|---------:|-------:|------:|
|
|
|
|
| 62 |
|[zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) | 37.52| 71.77| 55.26| 39.77| 51.08|
|
| 63 |
+
|[zephyr-7b-gemma](https://huggingface.co/HuggingFaceH4/zephyr-7b-gemma)| 34.22| 66.37| 52.19| 37.10| 47.47|
|
| 64 |
+
|[mlabonne/Gemmalpaca-7B](https://huggingface.co/mlabonne/Gemmalpaca-7B)| 21.6 | 40.87| 44.85 | 30.49| 34.45|
|
| 65 |
+
|[google/gemma-7b-it](https://huggingface.co/google/gemma-7b-it) | 21.33| 40.84| 41.70| 30.25| 33.53|
|
| 66 |
+
|
| 67 |
+
|
| 68 |
+
<details><summary>Details of AGIEval, GPT4All, TruthfulQA, BigBench </summary>
|
| 69 |
+
|
| 70 |
+
### AGIEval
|
| 71 |
+
| Task |Version| Metric |Value| |Stderr|
|
| 72 |
+
|------------------------------|------:|--------|----:|---|-----:|
|
| 73 |
+
|agieval_aqua_rat | 0|acc |21.65|± | 2.59|
|
| 74 |
+
| | |acc_norm|25.20|± | 2.73|
|
| 75 |
+
|agieval_logiqa_en | 0|acc |34.72|± | 1.87|
|
| 76 |
+
| | |acc_norm|35.94|± | 1.88|
|
| 77 |
+
|agieval_lsat_ar | 0|acc |19.57|± | 2.62|
|
| 78 |
+
| | |acc_norm|21.74|± | 2.73|
|
| 79 |
+
|agieval_lsat_lr | 0|acc |30.59|± | 2.04|
|
| 80 |
+
| | |acc_norm|32.55|± | 2.08|
|
| 81 |
+
|agieval_lsat_rc | 0|acc |49.07|± | 3.05|
|
| 82 |
+
| | |acc_norm|42.75|± | 3.02|
|
| 83 |
+
|agieval_sat_en | 0|acc |54.85|± | 3.48|
|
| 84 |
+
| | |acc_norm|53.40|± | 3.48|
|
| 85 |
+
|agieval_sat_en_without_passage| 0|acc |37.38|± | 3.38|
|
| 86 |
+
| | |acc_norm|33.98|± | 3.31|
|
| 87 |
+
|agieval_sat_math | 0|acc |30.91|± | 3.12|
|
| 88 |
+
| | |acc_norm|28.18|± | 3.04|
|
| 89 |
+
|
| 90 |
+
Average: 34.22%
|
| 91 |
+
|
| 92 |
+
### GPT4All
|
| 93 |
+
| Task |Version| Metric |Value| |Stderr|
|
| 94 |
+
|-------------|------:|--------|----:|---|-----:|
|
| 95 |
+
|arc_challenge| 0|acc |49.15|± | 1.46|
|
| 96 |
+
| | |acc_norm|52.47|± | 1.46|
|
| 97 |
+
|arc_easy | 0|acc |77.44|± | 0.86|
|
| 98 |
+
| | |acc_norm|74.75|± | 0.89|
|
| 99 |
+
|boolq | 1|acc |79.69|± | 0.70|
|
| 100 |
+
|hellaswag | 0|acc |60.59|± | 0.49|
|
| 101 |
+
| | |acc_norm|78.00|± | 0.41|
|
| 102 |
+
|openbookqa | 0|acc |29.20|± | 2.04|
|
| 103 |
+
| | |acc_norm|37.80|± | 2.17|
|
| 104 |
+
|piqa | 0|acc |76.82|± | 0.98|
|
| 105 |
+
| | |acc_norm|77.80|± | 0.97|
|
| 106 |
+
|winogrande | 0|acc |64.09|± | 1.35|
|
| 107 |
+
|
| 108 |
+
Average: 66.37%
|
| 109 |
+
|
| 110 |
+
### TruthfulQA
|
| 111 |
+
| Task |Version|Metric|Value| |Stderr|
|
| 112 |
+
|-------------|------:|------|----:|---|-----:|
|
| 113 |
+
|truthfulqa_mc| 1|mc1 |35.74|± | 1.68|
|
| 114 |
+
| | |mc2 |52.19|± | 1.59|
|
| 115 |
+
|
| 116 |
+
Average: 52.19%
|
| 117 |
+
|
| 118 |
+
### Bigbench
|
| 119 |
+
| Task |Version| Metric |Value| |Stderr|
|
| 120 |
+
|------------------------------------------------|------:|---------------------|----:|---|-----:|
|
| 121 |
+
|bigbench_causal_judgement | 0|multiple_choice_grade|53.68|± | 3.63|
|
| 122 |
+
|bigbench_date_understanding | 0|multiple_choice_grade|59.89|± | 2.55|
|
| 123 |
+
|bigbench_disambiguation_qa | 0|multiple_choice_grade|30.23|± | 2.86|
|
| 124 |
+
|bigbench_geometric_shapes | 0|multiple_choice_grade|11.42|± | 1.68|
|
| 125 |
+
| | |exact_str_match | 0.00|± | 0.00|
|
| 126 |
+
|bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|28.40|± | 2.02|
|
| 127 |
+
|bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|19.14|± | 1.49|
|
| 128 |
+
|bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|44.67|± | 2.88|
|
| 129 |
+
|bigbench_movie_recommendation | 0|multiple_choice_grade|26.80|± | 1.98|
|
| 130 |
+
|bigbench_navigate | 0|multiple_choice_grade|50.00|± | 1.58|
|
| 131 |
+
|bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|52.75|± | 1.12|
|
| 132 |
+
|bigbench_ruin_names | 0|multiple_choice_grade|33.04|± | 2.22|
|
| 133 |
+
|bigbench_salient_translation_error_detection | 0|multiple_choice_grade|33.37|± | 1.49|
|
| 134 |
+
|bigbench_snarks | 0|multiple_choice_grade|48.62|± | 3.73|
|
| 135 |
+
|bigbench_sports_understanding | 0|multiple_choice_grade|58.11|± | 1.57|
|
| 136 |
+
|bigbench_temporal_sequences | 0|multiple_choice_grade|37.20|± | 1.53|
|
| 137 |
+
|bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|20.08|± | 1.13|
|
| 138 |
+
|bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|15.77|± | 0.87|
|
| 139 |
+
|bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|44.67|± | 2.88|
|
| 140 |
+
|
| 141 |
+
Average: 37.1%
|
| 142 |
+
|
| 143 |
+
</details>
|
| 144 |
+
|
| 145 |
|
| 146 |
## Intended uses & limitations
|
| 147 |
|
|
|
|
| 151 |
Here's how you can run the model using the `pipeline()` function from 🤗 Transformers:
|
| 152 |
|
| 153 |
```python
|
| 154 |
+
# pip install transformers>=4.38.2
|
|
|
|
| 155 |
# pip install accelerate
|
| 156 |
|
| 157 |
import torch
|