metadata
datasets:
- agentlans/crash-course
base_model:
- google/gemma-2-9b-it
- FuseAI/FuseChat-Gemma-2-9B-Instruct
- jsgreenawalt/gemma-2-9B-it-advanced-v2.1
tags:
- gemma2
language:
- en
pipeline_tag: text-generation
license: gemma
model-index:
- name: Gemma2-9B-AdvancedFuse
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: IFEval (0-Shot)
type: wis-k/instruction-following-eval
split: train
args:
num_few_shot: 0
metrics:
- type: inst_level_strict_acc and prompt_level_strict_acc
value: 15.43
name: averaged accuracy
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FGemma2-9B-AdvancedFuse
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: BBH (3-Shot)
type: SaylorTwift/bbh
split: test
args:
num_few_shot: 3
metrics:
- type: acc_norm
value: 40.52
name: normalized accuracy
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FGemma2-9B-AdvancedFuse
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MATH Lvl 5 (4-Shot)
type: lighteval/MATH-Hard
split: test
args:
num_few_shot: 4
metrics:
- type: exact_match
value: 7.55
name: exact match
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FGemma2-9B-AdvancedFuse
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GPQA (0-shot)
type: Idavidrein/gpqa
split: train
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 11.3
name: acc_norm
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FGemma2-9B-AdvancedFuse
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MuSR (0-shot)
type: TAUR-Lab/MuSR
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 11.99
name: acc_norm
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FGemma2-9B-AdvancedFuse
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU-PRO (5-shot)
type: TIGER-Lab/MMLU-Pro
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 33.34
name: accuracy
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FGemma2-9B-AdvancedFuse
name: Open LLM Leaderboard
Gemma2-9B-AdvancedFuse
Gemma2-9B-AdvancedFuse is an experimental, open-source large language model (LLM) with 9 billion parameters. It aims to combine the strengths of FuseAI/FuseChat-Gemma-2-9B-Instruct and jsgreenawalt/gemma-2-9B-it-advanced-v2.1 through additive linear merging, further fine-tuned on a 12K row dataset from agentlans/crash-course for enhanced chat and instruct performance, including math and multilingual prompts.
Capabilities
- Text Generation: Generates coherent emails, summaries, and notes. This model card was primarily generated by the model itself.
- Instruction Following: Demonstrates strong ability to understand and execute instructions in conversational settings.
- Roleplaying: Can engage in third-person narrative roleplay but may exhibit common GPT expressions or clichés.
Limitations
As with most large language models:
- Factual Errors: May generate incorrect or outdated information due to data biases.
- Mathematical Operations: Struggles with mathematical calculations requiring symbolic reasoning despite its finetuning data.
- Handling Unsafe Input: May generate unsafe, biased, or malicious content if provided inappropriate input. Careful prompt engineering is recommended.
Model Usage Guidelines
- Use clear and specific instructions for optimal performance.
- Verify generated outputs for factual accuracy when critical information is involved.
- Avoid providing inputs that could lead to harmful or unethical responses.
- Consider using human review, especially in high-stakes applications.
Open LLM Leaderboard Evaluation Results
Detailed results can be found here! Summarized results can be found here!
Metric | Value (%) |
---|---|
Average | 20.02 |
IFEval (0-Shot) | 15.43 |
BBH (3-Shot) | 40.52 |
MATH Lvl 5 (4-Shot) | 7.55 |
GPQA (0-shot) | 11.30 |
MuSR (0-shot) | 11.99 |
MMLU-PRO (5-shot) | 33.34 |