Update README.md
Browse files
README.md
CHANGED
@@ -11,38 +11,42 @@ tags:
|
|
11 |
- merge
|
12 |
|
13 |
---
|
14 |
-
#
|
15 |
|
16 |
-
|
17 |
|
18 |
-
|
19 |
-
|
|
|
|
|
20 |
|
21 |
-
|
22 |
|
23 |
-
|
24 |
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
* [marcuscedricridia/Hush-Qwen2.5-7B-della3](https://huggingface.co/marcuscedricridia/Hush-Qwen2.5-7B-della3)
|
29 |
-
* [marcuscedricridia/Hush-Qwen2.5-7B-della4](https://huggingface.co/marcuscedricridia/Hush-Qwen2.5-7B-della4)
|
30 |
|
31 |
-
###
|
|
|
32 |
|
33 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
34 |
|
35 |
-
|
36 |
-
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
|
41 |
-
|
42 |
-
|
43 |
-
dtype: bfloat16
|
44 |
-
tokenizer_source: base
|
45 |
-
int8_mask: true
|
46 |
-
normalize: true
|
47 |
-
name: Hush-Qwen2.5-7B-Preview
|
48 |
-
```
|
|
|
11 |
- merge
|
12 |
|
13 |
---
|
14 |
+
# Model Card: Hush-Qwen2.5-7B-Preview
|
15 |
|
16 |
+
## Model Details
|
17 |
|
18 |
+
- **Model Name:** Hush-Qwen2.5-7B-Preview
|
19 |
+
- **Creator:** marcuscedricridia
|
20 |
+
- **Merge Technique:** YoYo v3
|
21 |
+
- **Primary Focus:** General performance with a focus on improving benchmarks, especially IFEVAL.
|
22 |
|
23 |
+
## Performance Highlights
|
24 |
|
25 |
+
Hush-Qwen2.5-7B-Preview was created using the YoYo v3 merge technique, achieving a new high on the IFEVAL test for 7B models with a score of **79.62%**. This makes it the **second-best** model in that category, though the leading model is currently unavailable, meaning we might be in **first place** by default!
|
26 |
|
27 |
+
### Strengths
|
28 |
+
- **High IFEVAL Score:** 79.62%, among the best for 7B models.
|
29 |
+
- **Well-rounded performance:** Decent scores across various benchmarks.
|
|
|
|
|
30 |
|
31 |
+
### Weaknesses
|
32 |
+
- **Low MATH Score:** 35%, which is significantly lower than our past models (which scored at least 45%). Improving this would make the model substantially better overall.
|
33 |
|
34 |
+
## Benchmark Results
|
35 |
+
| Category | Score (%) |
|
36 |
+
|-----------|----------|
|
37 |
+
| Average | 35.13 |
|
38 |
+
| IFEVAL | 79.62 |
|
39 |
+
| BBH | 35.33 |
|
40 |
+
| MATH | 37.54 |
|
41 |
+
| GPQA | 8.17 |
|
42 |
+
| MUSR | 12.73 |
|
43 |
+
| MMLU | 37.38 |
|
44 |
|
45 |
+
## Next Steps
|
46 |
+
|
47 |
+
- **Finetune on Math:** Bringing up the math score is a priority to create a well-balanced model.
|
48 |
+
- **Explore YoYo v4:** The next step could be merging this model with another one that is strong in math using the YoYo v4 technique. However, YoYo v4 lacks proper documentation, making it a challenge to implement.
|
49 |
+
- **Develop a Math-Strong Model:** An alternative approach is to build a new model that performs decently in all benchmarks but excels in math, then merge it with this one.
|
50 |
+
|
51 |
+
## Conclusion
|
52 |
+
Hush-Qwen2.5-7B-Preview is a strong contender in the IFEVAL category, achieving one of the highest scores among 7B models. However, improving the math benchmark is a key priority for future iterations. By either finetuning or leveraging new merge techniques like YoYo v4, we can push the model to new heights.
|
|
|
|
|
|
|
|
|
|
|
|