noneUsername's picture
Update README.md
da93d36 verified
metadata
base_model:
  - Orion-zhen/phi-4-abliterated

vllm (pretrained=/root/autodl-tmp/phi-4-abliterated,add_bos_token=true,tensor_parallel_size=2,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.932 ± 0.016
strict-match 5 exact_match 0.932 ± 0.016

vllm (pretrained=/root/autodl-tmp/phi-4-abliterated,add_bos_token=true,tensor_parallel_size=2,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.922 ± 0.012
strict-match 5 exact_match 0.922 ± 0.012

vllm (pretrained=/root/autodl-tmp/phi-4-abliterated-85,add_bos_token=true,tensor_parallel_size=2,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.92 ± 0.0172
strict-match 5 exact_match 0.92 ± 0.0172

vllm (pretrained=/root/autodl-tmp/phi-4-abliterated-85,add_bos_token=true,tensor_parallel_size=2,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.918 ± 0.0123
strict-match 5 exact_match 0.918 ± 0.0123

vllm (pretrained=/root/autodl-tmp/phi-4-abliterated-8625,add_bos_token=true,tensor_parallel_size=2,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.932 ± 0.016
strict-match 5 exact_match 0.932 ± 0.016

vllm (pretrained=/root/autodl-tmp/phi-4-abliterated-8625,add_bos_token=true,tensor_parallel_size=2,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.934 ± 0.0111
strict-match 5 exact_match 0.934 ± 0.0111

vllm (pretrained=/root/autodl-tmp/phi-4-abliterated-875,add_bos_token=true,tensor_parallel_size=2,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.924 ± 0.0168
strict-match 5 exact_match 0.924 ± 0.0168

vllm (pretrained=/root/autodl-tmp/phi-4-abliterated-875,add_bos_token=true,tensor_parallel_size=2,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.916 ± 0.0124
strict-match 5 exact_match 0.916 ± 0.0124

vllm (pretrained=/root/autodl-tmp/phi-4,add_bos_token=true,max_model_len=2048), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.928 ± 0.0164
strict-match 5 exact_match 0.928 ± 0.0164

vllm (pretrained=/root/autodl-tmp/phi-4,add_bos_token=true,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.93 ± 0.0114
strict-match 5 exact_match 0.93 ± 0.0114

vllm (pretrained=/root/autodl-tmp/phi-4,add_bos_token=true,max_model_len=800,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc 0.7649 ± 0.0137
- humanities 2 none acc 0.8103 ± 0.0256
- other 2 none acc 0.7487 ± 0.0287
- social sciences 2 none acc 0.8167 ± 0.0280
- stem 2 none acc 0.7123 ± 0.0260

vllm (pretrained=/root/autodl-tmp/phi-4-abliterated-W8A8,add_bos_token=true,max_model_len=2048), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.928 ± 0.0164
strict-match 5 exact_match 0.928 ± 0.0164

vllm (pretrained=/root/autodl-tmp/phi-4-abliterated-W8A8,add_bos_token=true,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.916 ± 0.0124
strict-match 5 exact_match 0.916 ± 0.0124

vllm (pretrained=/root/autodl-tmp/phi-4-abliterated-W8A8,add_bos_token=true,max_model_len=800,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc 0.7696 ± 0.0136
- humanities 2 none acc 0.8000 ± 0.0261
- other 2 none acc 0.7692 ± 0.0280
- social sciences 2 none acc 0.8389 ± 0.0265
- stem 2 none acc 0.7053 ± 0.0265