--- base_model: - marcuscedricridia/Hush-Qwen2.5-7B-della1 - marcuscedricridia/Hush-Qwen2.5-7B-della2 - marcuscedricridia/Hush-Qwen2.5-7B-RP-1M - marcuscedricridia/Hush-Qwen2.5-7B-della3 - marcuscedricridia/Hush-Qwen2.5-7B-della4 library_name: transformers tags: - mergekit - merge --- ## Performance Highlights Hush-Qwen2.5-7B-Preview was created using the YoYo v3 merge technique, achieving a new high on the IFEVAL test for 7B models with a score of **79.62%**. This makes it the **second-best** model in that category, though the leading model is currently unavailable, meaning we might be in **first place** by default! ### Strengths - **High IFEVAL Score:** 79.62%, among the best for 7B models. - **Well-rounded performance:** Decent scores across various benchmarks. ### Weaknesses - **Low MATH Score:** 35%, which is significantly lower than our past models (which scored at least 45%). Improving this would make the model substantially better overall. ## Benchmark Results | Category | Score (%) | |-----------|----------| | Average | 35.13 | | IFEVAL | 79.62 | | BBH | 35.33 | | MATH | 37.54 | | GPQA | 8.17 | | MUSR | 12.73 | | MMLU | 37.38 | ## Next Steps - **Finetune on Math:** Bringing up the math score is a priority to create a well-balanced model. - **Explore YoYo v4:** The next step could be merging this model with another one that is strong in math using the YoYo v4 technique. However, YoYo v4 lacks proper documentation, making it a challenge to implement. - **Develop a Math-Strong Model:** An alternative approach is to build a new model that performs decently in all benchmarks but excels in math, then merge it with this one. ## Conclusion Hush-Qwen2.5-7B-Preview is a strong contender in the IFEVAL category, achieving one of the highest scores among 7B models. However, improving the math benchmark is a key priority for future iterations. By either finetuning or leveraging new merge techniques like YoYo v4, we can push the model to new heights.