Details of Ability Loss

#1
by noneUsername - opened

Original model:

vllm (pretrained=/root/autodl-tmp/Cydonia-24B-v2,add_bos_token=true,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.912 Β± 0.018
strict-match 5 exact_match ↑ 0.912 Β± 0.018

vllm (pretrained=/root/autodl-tmp/Cydonia-24B-v2,add_bos_token=true,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.904 Β± 0.0132
strict-match 5 exact_match ↑ 0.894 Β± 0.0138

vllm (pretrained=/root/autodl-tmp/Cydonia-24B-v2,add_bos_token=true,max_model_len=700,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc ↑ 0.7942 Β± 0.0131
- humanities 2 none acc ↑ 0.8205 Β± 0.0257
- other 2 none acc ↑ 0.8103 Β± 0.0271
- social sciences 2 none acc ↑ 0.8500 Β± 0.0257
- stem 2 none acc ↑ 0.7298 Β± 0.0249

Final W8A8 quantization model:

vllm (pretrained=/root/autodl-tmp/87-1536,add_bos_token=true,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.880 Β± 0.0206
strict-match 5 exact_match ↑ 0.868 Β± 0.0215

vllm (pretrained=/root/autodl-tmp/87-1536,add_bos_token=true,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.880 Β± 0.0145
strict-match 5 exact_match ↑ 0.854 Β± 0.0158

vllm (pretrained=/root/autodl-tmp/87-1536,add_bos_token=true,max_model_len=700,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc ↑ 0.7836 Β± 0.0131
- humanities 2 none acc ↑ 0.8359 Β± 0.0249
- other 2 none acc ↑ 0.7795 Β± 0.0279
- social sciences 2 none acc ↑ 0.8444 Β± 0.0259
- stem 2 none acc ↑ 0.7123 Β± 0.0251

0.912->0.868: ↓0.044(4.82%)
0.894->0.854: ↓0.04(4.47%)
0.7942->0.7836: ↓0.0106(1.33%)

Sign up or log in to comment