Model Card for dynamic-qwen-mod-gamma0.5

This model is a dynamically-computed version of outputs/PRETRAIN-TEST-qwen2.5-0.5B-mod-pretrain_mix-2025-08-28_14-02-45-gamma=0.5/final_model, fine-tuned using the MOD architecture.

  • Dynamic Architecture: MOD
  • Capacity Gamma (γ): 0.5

The MOD architecture enables the model to conditionally skip parts of its computation, aiming for improved efficiency. The capacity_gamma parameter controls the portion of tokens processed by the dynamic components.

How to Use

This model requires trust_remote_code=True to load the custom architecture.

from transformers import AutoModelForCausalLM, AutoTokenizer

# It is recommended to load in bfloat16 for efficiency
model = AutoModelForCausalLM.from_pretrained(
    "fredericowieser/dynamic-qwen-mod-gamma0.5",
    trust_remote_code=True,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("fredericowieser/dynamic-qwen-mod-gamma0.5")

# Example usage
prompt = "The capital of the United Kingdom is"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=10)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Evaluation

Results on standard benchmarks:

Task Metric Value
arc_challenge acc,none 0.2065
arc_challenge acc_norm,none 0.2432
arc_challenge acc_norm_stderr,none 0.0125
arc_challenge acc_stderr,none 0.0118
hellaswag acc,none 0.2945
hellaswag acc_norm,none 0.3262
hellaswag acc_norm_stderr,none 0.0047
hellaswag acc_stderr,none 0.0045
mmlu acc,none 0.2332
mmlu acc_stderr,none 0.0036
mmlu_abstract_algebra acc,none 0.2000
mmlu_abstract_algebra acc_stderr,none 0.0402
mmlu_anatomy acc,none 0.2667
mmlu_anatomy acc_stderr,none 0.0382
mmlu_astronomy acc,none 0.1776
mmlu_astronomy acc_stderr,none 0.0311
mmlu_business_ethics acc,none 0.2800
mmlu_business_ethics acc_stderr,none 0.0451
mmlu_clinical_knowledge acc,none 0.2075
mmlu_clinical_knowledge acc_stderr,none 0.0250
mmlu_college_biology acc,none 0.2917
mmlu_college_biology acc_stderr,none 0.0380
mmlu_college_chemistry acc,none 0.2200
mmlu_college_chemistry acc_stderr,none 0.0416
mmlu_college_computer_science acc,none 0.2200
mmlu_college_computer_science acc_stderr,none 0.0416
mmlu_college_mathematics acc,none 0.2400
mmlu_college_mathematics acc_stderr,none 0.0429
mmlu_college_medicine acc,none 0.1965
mmlu_college_medicine acc_stderr,none 0.0303
mmlu_college_physics acc,none 0.2255
mmlu_college_physics acc_stderr,none 0.0416
mmlu_computer_security acc,none 0.2400
mmlu_computer_security acc_stderr,none 0.0429
mmlu_conceptual_physics acc,none 0.2723
mmlu_conceptual_physics acc_stderr,none 0.0291
mmlu_econometrics acc,none 0.2018
mmlu_econometrics acc_stderr,none 0.0378
mmlu_electrical_engineering acc,none 0.2483
mmlu_electrical_engineering acc_stderr,none 0.0360
mmlu_elementary_mathematics acc,none 0.2090
mmlu_elementary_mathematics acc_stderr,none 0.0209
mmlu_formal_logic acc,none 0.2143
mmlu_formal_logic acc_stderr,none 0.0367
mmlu_global_facts acc,none 0.2400
mmlu_global_facts acc_stderr,none 0.0429
mmlu_high_school_biology acc,none 0.2387
mmlu_high_school_biology acc_stderr,none 0.0243
mmlu_high_school_chemistry acc,none 0.1921
mmlu_high_school_chemistry acc_stderr,none 0.0277
mmlu_high_school_computer_science acc,none 0.2600
mmlu_high_school_computer_science acc_stderr,none 0.0441
mmlu_high_school_european_history acc,none 0.2182
mmlu_high_school_european_history acc_stderr,none 0.0323
mmlu_high_school_geography acc,none 0.2020
mmlu_high_school_geography acc_stderr,none 0.0286
mmlu_high_school_government_and_politics acc,none 0.1710
mmlu_high_school_government_and_politics acc_stderr,none 0.0272
mmlu_high_school_macroeconomics acc,none 0.2128
mmlu_high_school_macroeconomics acc_stderr,none 0.0208
mmlu_high_school_mathematics acc,none 0.2444
mmlu_high_school_mathematics acc_stderr,none 0.0262
mmlu_high_school_microeconomics acc,none 0.2017
mmlu_high_school_microeconomics acc_stderr,none 0.0261
mmlu_high_school_physics acc,none 0.1722
mmlu_high_school_physics acc_stderr,none 0.0308
mmlu_high_school_psychology acc,none 0.1633
mmlu_high_school_psychology acc_stderr,none 0.0158
mmlu_high_school_statistics acc,none 0.1806
mmlu_high_school_statistics acc_stderr,none 0.0262
mmlu_high_school_us_history acc,none 0.2549
mmlu_high_school_us_history acc_stderr,none 0.0306
mmlu_high_school_world_history acc,none 0.2321
mmlu_high_school_world_history acc_stderr,none 0.0275
mmlu_human_aging acc,none 0.2691
mmlu_human_aging acc_stderr,none 0.0298
mmlu_human_sexuality acc,none 0.2595
mmlu_human_sexuality acc_stderr,none 0.0384
mmlu_humanities acc,none 0.2425
mmlu_humanities acc_stderr,none 0.0063
mmlu_international_law acc,none 0.2562
mmlu_international_law acc_stderr,none 0.0398
mmlu_jurisprudence acc,none 0.2593
mmlu_jurisprudence acc_stderr,none 0.0424
mmlu_logical_fallacies acc,none 0.2147
mmlu_logical_fallacies acc_stderr,none 0.0323
mmlu_machine_learning acc,none 0.2589
mmlu_machine_learning acc_stderr,none 0.0416
mmlu_management acc,none 0.1650
mmlu_management acc_stderr,none 0.0368
mmlu_marketing acc,none 0.2906
mmlu_marketing acc_stderr,none 0.0297
mmlu_medical_genetics acc,none 0.3000
mmlu_medical_genetics acc_stderr,none 0.0461
mmlu_miscellaneous acc,none 0.2439
mmlu_miscellaneous acc_stderr,none 0.0154
mmlu_moral_disputes acc,none 0.2601
mmlu_moral_disputes acc_stderr,none 0.0236
mmlu_moral_scenarios acc,none 0.2425
mmlu_moral_scenarios acc_stderr,none 0.0143
mmlu_nutrition acc,none 0.2549
mmlu_nutrition acc_stderr,none 0.0250
mmlu_other acc,none 0.2391
mmlu_other acc_stderr,none 0.0076
mmlu_philosophy acc,none 0.2476
mmlu_philosophy acc_stderr,none 0.0245
mmlu_prehistory acc,none 0.2191
mmlu_prehistory acc_stderr,none 0.0230
mmlu_professional_accounting acc,none 0.2730
mmlu_professional_accounting acc_stderr,none 0.0266
mmlu_professional_law acc,none 0.2458
mmlu_professional_law acc_stderr,none 0.0110
mmlu_professional_medicine acc,none 0.1360
mmlu_professional_medicine acc_stderr,none 0.0208
mmlu_professional_psychology acc,none 0.2598
mmlu_professional_psychology acc_stderr,none 0.0177
mmlu_public_relations acc,none 0.2000
mmlu_public_relations acc_stderr,none 0.0383
mmlu_security_studies acc,none 0.2735
mmlu_security_studies acc_stderr,none 0.0285
mmlu_social_sciences acc,none 0.2187
mmlu_social_sciences acc_stderr,none 0.0074
mmlu_sociology acc,none 0.2537
mmlu_sociology acc_stderr,none 0.0308
mmlu_stem acc,none 0.2277
mmlu_stem acc_stderr,none 0.0075
mmlu_us_foreign_policy acc,none 0.2400
mmlu_us_foreign_policy acc_stderr,none 0.0429
mmlu_virology acc,none 0.2651
mmlu_virology acc_stderr,none 0.0344
mmlu_world_religions acc,none 0.2632
mmlu_world_religions acc_stderr,none 0.0338
truthfulqa_mc2 acc,none 0.4355
truthfulqa_mc2 acc_stderr,none 0.0155
winogrande acc,none 0.5264
winogrande acc_stderr,none 0.0140
Downloads last month
9
Safetensors
Model size
494M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support