File size: 2,041 Bytes
3787aba
 
 
 
 
 
 
 
 
 
 
 
 
7ab402c
3787aba
7ab402c
3787aba
7ab402c
 
 
3787aba
7ab402c
 
3787aba
7ab402c
 
 
 
 
 
 
 
 
 
3787aba
7ab402c
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
---
base_model:
- marcuscedricridia/Hush-Qwen2.5-7B-della1
- marcuscedricridia/Hush-Qwen2.5-7B-della2
- marcuscedricridia/Hush-Qwen2.5-7B-RP-1M
- marcuscedricridia/Hush-Qwen2.5-7B-della3
- marcuscedricridia/Hush-Qwen2.5-7B-della4
library_name: transformers
tags:
- mergekit
- merge

---
## Performance Highlights

Hush-Qwen2.5-7B-Preview was created using the YoYo v3 merge technique, achieving a new high on the IFEVAL test for 7B models with a score of **79.62%**. This makes it the **second-best** model in that category, though the leading model is currently unavailable, meaning we might be in **first place** by default!

### Strengths
- **High IFEVAL Score:** 79.62%, among the best for 7B models.
- **Well-rounded performance:** Decent scores across various benchmarks.

### Weaknesses
- **Low MATH Score:** 35%, which is significantly lower than our past models (which scored at least 45%). Improving this would make the model substantially better overall.

## Benchmark Results
| Category  | Score (%) |
|-----------|----------|
| Average   | 35.13    |
| IFEVAL    | 79.62    |
| BBH       | 35.33    |
| MATH      | 37.54    |
| GPQA      | 8.17     |
| MUSR      | 12.73    |
| MMLU      | 37.38    |

## Next Steps

- **Finetune on Math:** Bringing up the math score is a priority to create a well-balanced model.
- **Explore YoYo v4:** The next step could be merging this model with another one that is strong in math using the YoYo v4 technique. However, YoYo v4 lacks proper documentation, making it a challenge to implement.
- **Develop a Math-Strong Model:** An alternative approach is to build a new model that performs decently in all benchmarks but excels in math, then merge it with this one.

## Conclusion
Hush-Qwen2.5-7B-Preview is a strong contender in the IFEVAL category, achieving one of the highest scores among 7B models. However, improving the math benchmark is a key priority for future iterations. By either finetuning or leveraging new merge techniques like YoYo v4, we can push the model to new heights.