|
--- |
|
base_model: |
|
- marcuscedricridia/Hush-Qwen2.5-7B-della1 |
|
- marcuscedricridia/Hush-Qwen2.5-7B-della2 |
|
- marcuscedricridia/Hush-Qwen2.5-7B-RP-1M |
|
- marcuscedricridia/Hush-Qwen2.5-7B-della3 |
|
- marcuscedricridia/Hush-Qwen2.5-7B-della4 |
|
library_name: transformers |
|
tags: |
|
- mergekit |
|
- merge |
|
|
|
--- |
|
## Performance Highlights |
|
|
|
Hush-Qwen2.5-7B-Preview was created using the YoYo v3 merge technique, achieving a new high on the IFEVAL test for 7B models with a score of **79.62%**. This makes it the **second-best** model in that category, though the leading model is currently unavailable, meaning we might be in **first place** by default! |
|
|
|
### Strengths |
|
- **High IFEVAL Score:** 79.62%, among the best for 7B models. |
|
- **Well-rounded performance:** Decent scores across various benchmarks. |
|
|
|
### Weaknesses |
|
- **Low MATH Score:** 35%, which is significantly lower than our past models (which scored at least 45%). Improving this would make the model substantially better overall. |
|
|
|
## Benchmark Results |
|
| Category | Score (%) | |
|
|-----------|----------| |
|
| Average | 35.13 | |
|
| IFEVAL | 79.62 | |
|
| BBH | 35.33 | |
|
| MATH | 37.54 | |
|
| GPQA | 8.17 | |
|
| MUSR | 12.73 | |
|
| MMLU | 37.38 | |
|
|
|
## Next Steps |
|
|
|
- **Finetune on Math:** Bringing up the math score is a priority to create a well-balanced model. |
|
- **Explore YoYo v4:** The next step could be merging this model with another one that is strong in math using the YoYo v4 technique. However, YoYo v4 lacks proper documentation, making it a challenge to implement. |
|
- **Develop a Math-Strong Model:** An alternative approach is to build a new model that performs decently in all benchmarks but excels in math, then merge it with this one. |
|
|
|
## Conclusion |
|
Hush-Qwen2.5-7B-Preview is a strong contender in the IFEVAL category, achieving one of the highest scores among 7B models. However, improving the math benchmark is a key priority for future iterations. By either finetuning or leveraging new merge techniques like YoYo v4, we can push the model to new heights. |
|
|