noneUsername/Cydonia-24B-v2-W8A8

selected 70-1024-df10-u2k

vllm (pretrained=/root/autodl-tmp/Cydonia-24B-v2,add_bos_token=true,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.912	±	0.018
		strict-match	5	exact_match	↑	0.912	±	0.018

vllm (pretrained=/root/autodl-tmp/Cydonia-24B-v2,add_bos_token=true,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.904	±	0.0132
		strict-match	5	exact_match	↑	0.894	±	0.0138

vllm (pretrained=/root/autodl-tmp/Cydonia-24B-v2,add_bos_token=true,max_model_len=700,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.7942	±	0.0131
- humanities	2	none	acc	↑	0.8205	±	0.0257
- other	2	none	acc	↑	0.8103	±	0.0271
- social sciences	2	none	acc	↑	0.8500	±	0.0257
- stem	2	none	acc	↑	0.7298	±	0.0249

vllm (pretrained=/root/autodl-tmp/70-512-df10,add_bos_token=true,max_model_len=2048), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: 1

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.892	±	0.0197
		strict-match	5	exact_match	↑	0.880	±	0.0206

vllm (pretrained=/root/autodl-tmp/70-512-df10,add_bos_token=true,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.894	±	0.0138
		strict-match	5	exact_match	↑	0.876	±	0.0148

vllm (pretrained=/root/autodl-tmp/70-512-df10,add_bos_token=true,max_model_len=800,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.7895	±	0.0132
- humanities	2	none	acc	↑	0.8103	±	0.0257
- other	2	none	acc	↑	0.7949	±	0.0280
- social sciences	2	none	acc	↑	0.8722	±	0.0242
- stem	2	none	acc	↑	0.7193	±	0.0258

vllm (pretrained=/root/autodl-tmp/70-1536-df10,add_bos_token=true,max_model_len=2048), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.888	±	0.0200
		strict-match	5	exact_match	↑	0.876	±	0.0209

vllm (pretrained=/root/autodl-tmp/70-1536-df10,add_bos_token=true,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.884	±	0.0143
		strict-match	5	exact_match	↑	0.856	±	0.0157

vllm (pretrained=/root/autodl-tmp/70-1536-df10,add_bos_token=true,max_model_len=800,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.7906	±	0.0133
- humanities	2	none	acc	↑	0.8154	±	0.0266
- other	2	none	acc	↑	0.7949	±	0.0282
- social sciences	2	none	acc	↑	0.8611	±	0.0248
- stem	2	none	acc	↑	0.7263	±	0.0251

vllm (pretrained=/root/autodl-tmp/70-1024-df10-u2k,add_bos_token=true,max_model_len=2048), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.884	±	0.0203
		strict-match	5	exact_match	↑	0.864	±	0.0217

vllm (pretrained=/root/autodl-tmp/70-1024-df10-u2k,add_bos_token=true,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.886	±	0.0142
		strict-match	5	exact_match	↑	0.858	±	0.0156

vllm (pretrained=/root/autodl-tmp/70-1024-df10-u2k,add_bos_token=true,max_model_len=800,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.7965	±	0.0130
- humanities	2	none	acc	↑	0.8256	±	0.0259
- other	2	none	acc	↑	0.8000	±	0.0268
- social sciences	2	none	acc	↑	0.8778	±	0.0243
- stem	2	none	acc	↑	0.7228	±	0.0253

noneUsername
/

Cydonia-24B-v2-W8A8

Model tree for noneUsername/Cydonia-24B-v2-W8A8