marcuscedricridia's picture
Update README.md
b1f51ba verified
metadata
base_model:
  - marcuscedricridia/Hush-Qwen2.5-7B-della1
  - marcuscedricridia/Hush-Qwen2.5-7B-della2
  - marcuscedricridia/Hush-Qwen2.5-7B-RP-1M
  - marcuscedricridia/Hush-Qwen2.5-7B-della3
  - marcuscedricridia/Hush-Qwen2.5-7B-della4
library_name: transformers
tags:
  - mergekit
  - merge

Performance Highlights

Hush-Qwen2.5-7B-Preview was created using the YoYo v3 merge technique, achieving a new high on the IFEVAL test for 7B models with a score of 79.62%. This makes it the second-best model in that category, though the leading model is currently unavailable, meaning we might be in first place by default!

Strengths

  • High IFEVAL Score: 79.62%, among the best for 7B models.
  • Well-rounded performance: Decent scores across various benchmarks.

Weaknesses

  • Low MATH Score: 35%, which is significantly lower than our past models (which scored at least 45%). Improving this would make the model substantially better overall.

Benchmark Results

Category Score (%)
Average 35.13
IFEVAL 79.62
BBH 35.33
MATH 37.54
GPQA 8.17
MUSR 12.73
MMLU 37.38

Next Steps

  • Finetune on Math: Bringing up the math score is a priority to create a well-balanced model.
  • Explore YoYo v4: The next step could be merging this model with another one that is strong in math using the YoYo v4 technique. However, YoYo v4 lacks proper documentation, making it a challenge to implement.
  • Develop a Math-Strong Model: An alternative approach is to build a new model that performs decently in all benchmarks but excels in math, then merge it with this one.

Conclusion

Hush-Qwen2.5-7B-Preview is a strong contender in the IFEVAL category, achieving one of the highest scores among 7B models. However, improving the math benchmark is a key priority for future iterations. By either finetuning or leveraging new merge techniques like YoYo v4, we can push the model to new heights.