aura-thinking-v1 / README.md
leesangboo's picture
Upload folder using huggingface_hub
7dd7bd2 verified
metadata
base_model:
  - a-m-team/AM-Thinking-v1

vllm (pretrained=/root/autodl-tmp/AM-Thinking-v1,add_bos_token=true,max_model_len=5096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.792 ± 0.0257
strict-match 5 exact_match 0.780 ± 0.0263

vllm (pretrained=/root/autodl-tmp/AM-Thinking-v1,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.798 ± 0.0180
strict-match 5 exact_match 0.786 ± 0.0184

vllm (pretrained=/root/autodl-tmp/AM-Thinking-v1,add_bos_token=true,max_model_len=3048,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc 0.8023 ± 0.0131
- humanities 2 none acc 0.8154 ± 0.0276
- other 2 none acc 0.8000 ± 0.0276
- social sciences 2 none acc 0.8556 ± 0.0255
- stem 2 none acc 0.7614 ± 0.0237

vllm (pretrained=/root/autodl-tmp/AM-Thinking-v1-awq,add_bos_token=true,max_model_len=5096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.820 ± 0.0243
strict-match 5 exact_match 0.816 ± 0.0246

vllm (pretrained=/root/autodl-tmp/AM-Thinking-v1-awq,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.816 ± 0.0173
strict-match 5 exact_match 0.814 ± 0.0174

vllm (pretrained=/root/autodl-tmp/AM-Thinking-v1-awq,add_bos_token=true,max_model_len=3048,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc 0.7930 ± 0.0132
- humanities 2 none acc 0.8051 ± 0.0278
- other 2 none acc 0.7846 ± 0.0277
- social sciences 2 none acc 0.8444 ± 0.0261
- stem 2 none acc 0.7579 ± 0.0242