--- base_model: - grimjim/SauerHuatuoSkywork-o1-Llama-3.1-8B library_name: transformers pipeline_tag: text-generation quanted_by: grimjim license: llama3.1 model-index: - name: SauerHuatuoSkywork-o1-Llama-3.1-8B results: - task: type: text-generation name: Text Generation dataset: name: IFEval (0-Shot) type: wis-k/instruction-following-eval split: train args: num_few_shot: 0 metrics: - type: inst_level_strict_acc and prompt_level_strict_acc value: 52.19 name: averaged accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=grimjim%2FSauerHuatuoSkywork-o1-Llama-3.1-8B name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: BBH (3-Shot) type: SaylorTwift/bbh split: test args: num_few_shot: 3 metrics: - type: acc_norm value: 32.09 name: normalized accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=grimjim%2FSauerHuatuoSkywork-o1-Llama-3.1-8B name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MATH Lvl 5 (4-Shot) type: lighteval/MATH-Hard split: test args: num_few_shot: 4 metrics: - type: exact_match value: 16.99 name: exact match source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=grimjim%2FSauerHuatuoSkywork-o1-Llama-3.1-8B name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GPQA (0-shot) type: Idavidrein/gpqa split: train args: num_few_shot: 0 metrics: - type: acc_norm value: 9.51 name: acc_norm source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=grimjim%2FSauerHuatuoSkywork-o1-Llama-3.1-8B name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MuSR (0-shot) type: TAUR-Lab/MuSR args: num_few_shot: 0 metrics: - type: acc_norm value: 15.79 name: acc_norm source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=grimjim%2FSauerHuatuoSkywork-o1-Llama-3.1-8B name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU-PRO (5-shot) type: TIGER-Lab/MMLU-Pro config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 33.23 name: accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=grimjim%2FSauerHuatuoSkywork-o1-Llama-3.1-8B name: Open LLM Leaderboard --- # SauerHuatuoSkywork-o1-Llama-3.1-8B-GGUF This repo contains GGUF quants of a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit). An experiment to hybridize a relatively high scoring Llama 3.1 8B model with o1 reasoning capabilities. Although IFEval benched lower than the SauerkrautLM mode, every other benchmark improved from the addition of the o1 merge at low weight. Made with Llama. ## Merge Details ### Merge Method This model was merged using the SLERP merge method. ### Models Merged The following models were included in the merge: * [grimjim/HuatuoSkywork-o1-Llama-3.1-8B](https://huggingface.co/grimjim/HuatuoSkywork-o1-Llama-3.1-8B) * [VAGOsolutions/Llama-3.1-SauerkrautLM-8b-Instruct](https://huggingface.co/VAGOsolutions/Llama-3.1-SauerkrautLM-8b-Instruct) ### Configuration The following YAML configuration was used to produce this model: ```yaml models: - model: grimjim/HuatuoSkywork-o1-Llama-3.1-8B - model: VAGOsolutions/Llama-3.1-SauerkrautLM-8b-Instruct merge_method: slerp base_model: grimjim/HuatuoSkywork-o1-Llama-3.1-8B parameters: t: - value: 0.96 dtype: bfloat16 ``` # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/grimjim__SauerHuatuoSkywork-o1-Llama-3.1-8B-details)! Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=grimjim%2FSauerHuatuoSkywork-o1-Llama-3.1-8B&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)! | Metric |Value (%)| |-------------------|--------:| |**Average** | 26.63| |IFEval (0-Shot) | 52.19| |BBH (3-Shot) | 32.09| |MATH Lvl 5 (4-Shot)| 16.99| |GPQA (0-shot) | 9.51| |MuSR (0-shot) | 15.79| |MMLU-PRO (5-shot) | 33.23|