noneUsername commited on
Commit
39444a3
·
verified ·
1 Parent(s): a232d88

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +66 -0
README.md ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - huihui-ai/QwQ-32B-abliterated
4
+ ---
5
+ vllm (pretrained=/root/autodl-tmp/QwQ-32B,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
6
+ |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
7
+ |-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
8
+ |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.432|± |0.0314|
9
+ | | |strict-match | 5|exact_match|↑ |0.744|± |0.0277|
10
+
11
+ vllm (pretrained=/root/autodl-tmp/QwQ-32B,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
12
+ |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
13
+ |-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
14
+ |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.444|± |0.0222|
15
+ | | |strict-match | 5|exact_match|↑ |0.716|± |0.0202|
16
+
17
+ | Groups |Version|Filter|n-shot|Metric| |Value | |Stderr|
18
+ |------------------|------:|------|------|------|---|-----:|---|-----:|
19
+ |mmlu | 2|none | |acc |↑ |0.8140|± |0.0125|
20
+ | - humanities | 2|none | |acc |↑ |0.8359|± |0.0251|
21
+ | - other | 2|none | |acc |↑ |0.8103|± |0.0269|
22
+ | - social sciences| 2|none | |acc |↑ |0.8889|± |0.0222|
23
+ | - stem | 2|none | |acc |↑ |0.7544|± |0.0238|
24
+
25
+
26
+ vllm (pretrained=/root/autodl-tmp/QwQ-32B-abliterated,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
27
+ |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
28
+ |-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
29
+ |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.528|± |0.0316|
30
+ | | |strict-match | 5|exact_match|↑ |0.740|± |0.0278|
31
+
32
+ vllm (pretrained=/root/autodl-tmp/QwQ-32B-abliterated,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
33
+ |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
34
+ |-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
35
+ |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.492|± |0.0224|
36
+ | | |strict-match | 5|exact_match|↑ |0.742|± |0.0196|
37
+
38
+ vllm (pretrained=/root/autodl-tmp/QwQ-32B-abliterated,add_bos_token=true,max_model_len=4096,dtype=bfloat16,max_num_seqs=3), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1
39
+ | Groups |Version|Filter|n-shot|Metric| |Value | |Stderr|
40
+ |------------------|------:|------|------|------|---|-----:|---|-----:|
41
+ |mmlu | 2|none | |acc |↑ |0.8152|± |0.0126|
42
+ | - humanities | 2|none | |acc |↑ |0.8359|± |0.0253|
43
+ | - other | 2|none | |acc |↑ |0.8000|± |0.0276|
44
+ | - social sciences| 2|none | |acc |↑ |0.8722|± |0.0240|
45
+ | - stem | 2|none | |acc |↑ |0.7754|± |0.0232|
46
+
47
+
48
+ vllm (pretrained=/root/autodl-tmp/QwQ-32B-abliterated-awq,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
49
+ |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
50
+ |-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
51
+ |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.476|± |0.0316|
52
+ | | |strict-match | 5|exact_match|↑ |0.752|± |0.0274|
53
+
54
+ vllm (pretrained=/root/autodl-tmp/QwQ-32B-abliterated-awq,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
55
+ |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
56
+ |-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
57
+ |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.524|± |0.0224|
58
+ | | |strict-match | 5|exact_match|↑ |0.716|± |0.0202|
59
+
60
+ | Groups |Version|Filter|n-shot|Metric| |Value | |Stderr|
61
+ |------------------|------:|------|------|------|---|-----:|---|-----:|
62
+ |mmlu | 2|none | |acc |↑ |0.8023|± |0.0130|
63
+ | - humanities | 2|none | |acc |↑ |0.8000|± |0.0266|
64
+ | - other | 2|none | |acc |↑ |0.7949|± |0.0284|
65
+ | - social sciences| 2|none | |acc |↑ |0.8500|± |0.0258|
66
+ | - stem | 2|none | |acc |↑ |0.7789|± |0.0235|