YOYO-AI
/

Qwen2.5-14B-1M-YOYO-V3

@@ -13,6 +13,101 @@ base_model:
 pipeline_tag: text-generation
 tags:
 - merge
 ---
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e174e202fa032de4143324/CfIE4_oZgpNsNZyurjO7D.png)
@@ -46,7 +141,18 @@ name: Qwen2.5-14B-YOYO-latest-V2
 Although the uncontrollable output issue has been addressed, the model still lacks stability.
 Through practical experimentation, I found that first merging **"high-divergence"** models (significantly different from the base) into **"low-divergence"** models (closer to the base) using the  [DELLA](https://arxiv.org/abs/2406.11617)  method, then applying the  [Model Stock](https://arxiv.org/abs/2403.19522)  method, ultimately produces a model that is not only more stable but also achieves better performance.
 ## Key models used:
 *1. Low-divergence, high-performance models:*
@@ -191,4 +297,4 @@ int8_mask: true
 normalize: true
 name: Qwen2.5-14B-1M-YOYO-V3
 ```
-I hope this helps!

 pipeline_tag: text-generation
 tags:
 - merge
+model-index:
+- name: Qwen2.5-14B-1M-YOYO-V3
+  results:
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: IFEval (0-Shot)
+      type: HuggingFaceH4/ifeval
+      args:
+        num_few_shot: 0
+    metrics:
+    - type: inst_level_strict_acc and prompt_level_strict_acc
+      value: 83.98
+      name: strict accuracy
+    source:
+      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=YOYO-AI/Qwen2.5-14B-1M-YOYO-V3
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: BBH (3-Shot)
+      type: BBH
+      args:
+        num_few_shot: 3
+    metrics:
+    - type: acc_norm
+      value: 49.47
+      name: normalized accuracy
+    source:
+      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=YOYO-AI/Qwen2.5-14B-1M-YOYO-V3
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: MATH Lvl 5 (4-Shot)
+      type: hendrycks/competition_math
+      args:
+        num_few_shot: 4
+    metrics:
+    - type: exact_match
+      value: 53.55
+      name: exact match
+    source:
+      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=YOYO-AI/Qwen2.5-14B-1M-YOYO-V3
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: GPQA (0-shot)
+      type: Idavidrein/gpqa
+      args:
+        num_few_shot: 0
+    metrics:
+    - type: acc_norm
+      value: 10.51
+      name: acc_norm
+    source:
+      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=YOYO-AI/Qwen2.5-14B-1M-YOYO-V3
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: MuSR (0-shot)
+      type: TAUR-Lab/MuSR
+      args:
+        num_few_shot: 0
+    metrics:
+    - type: acc_norm
+      value: 11.10
+      name: acc_norm
+    source:
+      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=YOYO-AI/Qwen2.5-14B-1M-YOYO-V3
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: MMLU-PRO (5-shot)
+      type: TIGER-Lab/MMLU-Pro
+      config: main
+      split: test
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 46.74
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=YOYO-AI/Qwen2.5-14B-1M-YOYO-V3
+      name: Open LLM Leaderboard
 ---
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e174e202fa032de4143324/CfIE4_oZgpNsNZyurjO7D.png)
 Although the uncontrollable output issue has been addressed, the model still lacks stability.
 Through practical experimentation, I found that first merging **"high-divergence"** models (significantly different from the base) into **"low-divergence"** models (closer to the base) using the  [DELLA](https://arxiv.org/abs/2406.11617)  method, then applying the  [Model Stock](https://arxiv.org/abs/2403.19522)  method, ultimately produces a model that is not only more stable but also achieves better performance.
+# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
+Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Replete-AI__Replete-LLM-V2.5-Qwen-14b)
+|      Metric       |Value|
+|-------------------|----:|
+|Avg.               |42.56|
+|IFEval (0-Shot)    |83.98|
+|BBH (3-Shot)       |49.47|
+|MATH Lvl 5 (4-Shot)|53.55|
+|GPQA (0-shot)      |10.51|
+|MuSR (0-shot)      |11.10|
+|MMLU-PRO (5-shot)  |46.74|
 ## Key models used:
 *1. Low-divergence, high-performance models:*
 normalize: true
 name: Qwen2.5-14B-1M-YOYO-V3
 ```
+I hope this helps!