File size: 2,041 Bytes
3787aba 7ab402c 3787aba 7ab402c 3787aba 7ab402c 3787aba 7ab402c 3787aba 7ab402c 3787aba 7ab402c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
---
base_model:
- marcuscedricridia/Hush-Qwen2.5-7B-della1
- marcuscedricridia/Hush-Qwen2.5-7B-della2
- marcuscedricridia/Hush-Qwen2.5-7B-RP-1M
- marcuscedricridia/Hush-Qwen2.5-7B-della3
- marcuscedricridia/Hush-Qwen2.5-7B-della4
library_name: transformers
tags:
- mergekit
- merge
---
## Performance Highlights
Hush-Qwen2.5-7B-Preview was created using the YoYo v3 merge technique, achieving a new high on the IFEVAL test for 7B models with a score of **79.62%**. This makes it the **second-best** model in that category, though the leading model is currently unavailable, meaning we might be in **first place** by default!
### Strengths
- **High IFEVAL Score:** 79.62%, among the best for 7B models.
- **Well-rounded performance:** Decent scores across various benchmarks.
### Weaknesses
- **Low MATH Score:** 35%, which is significantly lower than our past models (which scored at least 45%). Improving this would make the model substantially better overall.
## Benchmark Results
| Category | Score (%) |
|-----------|----------|
| Average | 35.13 |
| IFEVAL | 79.62 |
| BBH | 35.33 |
| MATH | 37.54 |
| GPQA | 8.17 |
| MUSR | 12.73 |
| MMLU | 37.38 |
## Next Steps
- **Finetune on Math:** Bringing up the math score is a priority to create a well-balanced model.
- **Explore YoYo v4:** The next step could be merging this model with another one that is strong in math using the YoYo v4 technique. However, YoYo v4 lacks proper documentation, making it a challenge to implement.
- **Develop a Math-Strong Model:** An alternative approach is to build a new model that performs decently in all benchmarks but excels in math, then merge it with this one.
## Conclusion
Hush-Qwen2.5-7B-Preview is a strong contender in the IFEVAL category, achieving one of the highest scores among 7B models. However, improving the math benchmark is a key priority for future iterations. By either finetuning or leveraging new merge techniques like YoYo v4, we can push the model to new heights.
|