marcuscedricridia
/

Hush-Qwen2.5-7B-Preview

@@ -11,38 +11,42 @@ tags:
 - merge
 ---
-# merge
-This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
-## Merge Details
-### Merge Method
-This model was merged using the [Model Stock](https://arxiv.org/abs/2403.19522) merge method using [marcuscedricridia/Hush-Qwen2.5-7B-RP-1M](https://huggingface.co/marcuscedricridia/Hush-Qwen2.5-7B-RP-1M) as a base.
-### Models Merged
-The following models were included in the merge:
-* [marcuscedricridia/Hush-Qwen2.5-7B-della1](https://huggingface.co/marcuscedricridia/Hush-Qwen2.5-7B-della1)
-* [marcuscedricridia/Hush-Qwen2.5-7B-della2](https://huggingface.co/marcuscedricridia/Hush-Qwen2.5-7B-della2)
-* [marcuscedricridia/Hush-Qwen2.5-7B-della3](https://huggingface.co/marcuscedricridia/Hush-Qwen2.5-7B-della3)
-* [marcuscedricridia/Hush-Qwen2.5-7B-della4](https://huggingface.co/marcuscedricridia/Hush-Qwen2.5-7B-della4)
-### Configuration
-The following YAML configuration was used to produce this model:
-```yaml
-merge_method: model_stock
-base_model: marcuscedricridia/Hush-Qwen2.5-7B-RP-1M
-models:
-  - model: marcuscedricridia/Hush-Qwen2.5-7B-della1
-  - model: marcuscedricridia/Hush-Qwen2.5-7B-della2
-  - model: marcuscedricridia/Hush-Qwen2.5-7B-della3
-  - model: marcuscedricridia/Hush-Qwen2.5-7B-della4
-dtype: bfloat16
-tokenizer_source: base
-int8_mask: true
-normalize: true
-name: Hush-Qwen2.5-7B-Preview
-```

 - merge
 ---
+# Model Card: Hush-Qwen2.5-7B-Preview
+## Model Details
+- **Model Name:** Hush-Qwen2.5-7B-Preview
+- **Creator:** marcuscedricridia
+- **Merge Technique:** YoYo v3
+- **Primary Focus:** General performance with a focus on improving benchmarks, especially IFEVAL.
+## Performance Highlights
+Hush-Qwen2.5-7B-Preview was created using the YoYo v3 merge technique, achieving a new high on the IFEVAL test for 7B models with a score of **79.62%**. This makes it the **second-best** model in that category, though the leading model is currently unavailable, meaning we might be in **first place** by default!
+### Strengths
+- **High IFEVAL Score:** 79.62%, among the best for 7B models.
+- **Well-rounded performance:** Decent scores across various benchmarks.
+### Weaknesses
+- **Low MATH Score:** 35%, which is significantly lower than our past models (which scored at least 45%). Improving this would make the model substantially better overall.
+## Benchmark Results
+| Category  | Score (%) |
+|-----------|----------|
+| Average   | 35.13    |
+| IFEVAL    | 79.62    |
+| BBH       | 35.33    |
+| MATH      | 37.54    |
+| GPQA      | 8.17     |
+| MUSR      | 12.73    |
+| MMLU      | 37.38    |
+## Next Steps
+- **Finetune on Math:** Bringing up the math score is a priority to create a well-balanced model.
+- **Explore YoYo v4:** The next step could be merging this model with another one that is strong in math using the YoYo v4 technique. However, YoYo v4 lacks proper documentation, making it a challenge to implement.
+- **Develop a Math-Strong Model:** An alternative approach is to build a new model that performs decently in all benchmarks but excels in math, then merge it with this one.
+## Conclusion
+Hush-Qwen2.5-7B-Preview is a strong contender in the IFEVAL category, achieving one of the highest scores among 7B models. However, improving the math benchmark is a key priority for future iterations. By either finetuning or leveraging new merge techniques like YoYo v4, we can push the model to new heights.