Model Card for dynamic-qwen-mod-gamma0.5
This model is a dynamically-computed version of outputs/PRETRAIN-TEST-qwen2.5-0.5B-mod-pretrain_mix-2025-08-28_14-02-45-gamma=0.5/final_model
, fine-tuned
using the MOD architecture.
- Dynamic Architecture:
MOD
- Capacity Gamma (γ):
0.5
The MOD
architecture enables the model to conditionally skip parts of its
computation, aiming for improved efficiency. The capacity_gamma
parameter
controls the portion of tokens processed by the dynamic components.
How to Use
This model requires trust_remote_code=True
to load the custom architecture.
from transformers import AutoModelForCausalLM, AutoTokenizer
# It is recommended to load in bfloat16 for efficiency
model = AutoModelForCausalLM.from_pretrained(
"fredericowieser/dynamic-qwen-mod-gamma0.5",
trust_remote_code=True,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("fredericowieser/dynamic-qwen-mod-gamma0.5")
# Example usage
prompt = "The capital of the United Kingdom is"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=10)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Evaluation
Results on standard benchmarks:
Task | Metric | Value |
---|---|---|
arc_challenge | acc,none | 0.2065 |
arc_challenge | acc_norm,none | 0.2432 |
arc_challenge | acc_norm_stderr,none | 0.0125 |
arc_challenge | acc_stderr,none | 0.0118 |
hellaswag | acc,none | 0.2945 |
hellaswag | acc_norm,none | 0.3262 |
hellaswag | acc_norm_stderr,none | 0.0047 |
hellaswag | acc_stderr,none | 0.0045 |
mmlu | acc,none | 0.2332 |
mmlu | acc_stderr,none | 0.0036 |
mmlu_abstract_algebra | acc,none | 0.2000 |
mmlu_abstract_algebra | acc_stderr,none | 0.0402 |
mmlu_anatomy | acc,none | 0.2667 |
mmlu_anatomy | acc_stderr,none | 0.0382 |
mmlu_astronomy | acc,none | 0.1776 |
mmlu_astronomy | acc_stderr,none | 0.0311 |
mmlu_business_ethics | acc,none | 0.2800 |
mmlu_business_ethics | acc_stderr,none | 0.0451 |
mmlu_clinical_knowledge | acc,none | 0.2075 |
mmlu_clinical_knowledge | acc_stderr,none | 0.0250 |
mmlu_college_biology | acc,none | 0.2917 |
mmlu_college_biology | acc_stderr,none | 0.0380 |
mmlu_college_chemistry | acc,none | 0.2200 |
mmlu_college_chemistry | acc_stderr,none | 0.0416 |
mmlu_college_computer_science | acc,none | 0.2200 |
mmlu_college_computer_science | acc_stderr,none | 0.0416 |
mmlu_college_mathematics | acc,none | 0.2400 |
mmlu_college_mathematics | acc_stderr,none | 0.0429 |
mmlu_college_medicine | acc,none | 0.1965 |
mmlu_college_medicine | acc_stderr,none | 0.0303 |
mmlu_college_physics | acc,none | 0.2255 |
mmlu_college_physics | acc_stderr,none | 0.0416 |
mmlu_computer_security | acc,none | 0.2400 |
mmlu_computer_security | acc_stderr,none | 0.0429 |
mmlu_conceptual_physics | acc,none | 0.2723 |
mmlu_conceptual_physics | acc_stderr,none | 0.0291 |
mmlu_econometrics | acc,none | 0.2018 |
mmlu_econometrics | acc_stderr,none | 0.0378 |
mmlu_electrical_engineering | acc,none | 0.2483 |
mmlu_electrical_engineering | acc_stderr,none | 0.0360 |
mmlu_elementary_mathematics | acc,none | 0.2090 |
mmlu_elementary_mathematics | acc_stderr,none | 0.0209 |
mmlu_formal_logic | acc,none | 0.2143 |
mmlu_formal_logic | acc_stderr,none | 0.0367 |
mmlu_global_facts | acc,none | 0.2400 |
mmlu_global_facts | acc_stderr,none | 0.0429 |
mmlu_high_school_biology | acc,none | 0.2387 |
mmlu_high_school_biology | acc_stderr,none | 0.0243 |
mmlu_high_school_chemistry | acc,none | 0.1921 |
mmlu_high_school_chemistry | acc_stderr,none | 0.0277 |
mmlu_high_school_computer_science | acc,none | 0.2600 |
mmlu_high_school_computer_science | acc_stderr,none | 0.0441 |
mmlu_high_school_european_history | acc,none | 0.2182 |
mmlu_high_school_european_history | acc_stderr,none | 0.0323 |
mmlu_high_school_geography | acc,none | 0.2020 |
mmlu_high_school_geography | acc_stderr,none | 0.0286 |
mmlu_high_school_government_and_politics | acc,none | 0.1710 |
mmlu_high_school_government_and_politics | acc_stderr,none | 0.0272 |
mmlu_high_school_macroeconomics | acc,none | 0.2128 |
mmlu_high_school_macroeconomics | acc_stderr,none | 0.0208 |
mmlu_high_school_mathematics | acc,none | 0.2444 |
mmlu_high_school_mathematics | acc_stderr,none | 0.0262 |
mmlu_high_school_microeconomics | acc,none | 0.2017 |
mmlu_high_school_microeconomics | acc_stderr,none | 0.0261 |
mmlu_high_school_physics | acc,none | 0.1722 |
mmlu_high_school_physics | acc_stderr,none | 0.0308 |
mmlu_high_school_psychology | acc,none | 0.1633 |
mmlu_high_school_psychology | acc_stderr,none | 0.0158 |
mmlu_high_school_statistics | acc,none | 0.1806 |
mmlu_high_school_statistics | acc_stderr,none | 0.0262 |
mmlu_high_school_us_history | acc,none | 0.2549 |
mmlu_high_school_us_history | acc_stderr,none | 0.0306 |
mmlu_high_school_world_history | acc,none | 0.2321 |
mmlu_high_school_world_history | acc_stderr,none | 0.0275 |
mmlu_human_aging | acc,none | 0.2691 |
mmlu_human_aging | acc_stderr,none | 0.0298 |
mmlu_human_sexuality | acc,none | 0.2595 |
mmlu_human_sexuality | acc_stderr,none | 0.0384 |
mmlu_humanities | acc,none | 0.2425 |
mmlu_humanities | acc_stderr,none | 0.0063 |
mmlu_international_law | acc,none | 0.2562 |
mmlu_international_law | acc_stderr,none | 0.0398 |
mmlu_jurisprudence | acc,none | 0.2593 |
mmlu_jurisprudence | acc_stderr,none | 0.0424 |
mmlu_logical_fallacies | acc,none | 0.2147 |
mmlu_logical_fallacies | acc_stderr,none | 0.0323 |
mmlu_machine_learning | acc,none | 0.2589 |
mmlu_machine_learning | acc_stderr,none | 0.0416 |
mmlu_management | acc,none | 0.1650 |
mmlu_management | acc_stderr,none | 0.0368 |
mmlu_marketing | acc,none | 0.2906 |
mmlu_marketing | acc_stderr,none | 0.0297 |
mmlu_medical_genetics | acc,none | 0.3000 |
mmlu_medical_genetics | acc_stderr,none | 0.0461 |
mmlu_miscellaneous | acc,none | 0.2439 |
mmlu_miscellaneous | acc_stderr,none | 0.0154 |
mmlu_moral_disputes | acc,none | 0.2601 |
mmlu_moral_disputes | acc_stderr,none | 0.0236 |
mmlu_moral_scenarios | acc,none | 0.2425 |
mmlu_moral_scenarios | acc_stderr,none | 0.0143 |
mmlu_nutrition | acc,none | 0.2549 |
mmlu_nutrition | acc_stderr,none | 0.0250 |
mmlu_other | acc,none | 0.2391 |
mmlu_other | acc_stderr,none | 0.0076 |
mmlu_philosophy | acc,none | 0.2476 |
mmlu_philosophy | acc_stderr,none | 0.0245 |
mmlu_prehistory | acc,none | 0.2191 |
mmlu_prehistory | acc_stderr,none | 0.0230 |
mmlu_professional_accounting | acc,none | 0.2730 |
mmlu_professional_accounting | acc_stderr,none | 0.0266 |
mmlu_professional_law | acc,none | 0.2458 |
mmlu_professional_law | acc_stderr,none | 0.0110 |
mmlu_professional_medicine | acc,none | 0.1360 |
mmlu_professional_medicine | acc_stderr,none | 0.0208 |
mmlu_professional_psychology | acc,none | 0.2598 |
mmlu_professional_psychology | acc_stderr,none | 0.0177 |
mmlu_public_relations | acc,none | 0.2000 |
mmlu_public_relations | acc_stderr,none | 0.0383 |
mmlu_security_studies | acc,none | 0.2735 |
mmlu_security_studies | acc_stderr,none | 0.0285 |
mmlu_social_sciences | acc,none | 0.2187 |
mmlu_social_sciences | acc_stderr,none | 0.0074 |
mmlu_sociology | acc,none | 0.2537 |
mmlu_sociology | acc_stderr,none | 0.0308 |
mmlu_stem | acc,none | 0.2277 |
mmlu_stem | acc_stderr,none | 0.0075 |
mmlu_us_foreign_policy | acc,none | 0.2400 |
mmlu_us_foreign_policy | acc_stderr,none | 0.0429 |
mmlu_virology | acc,none | 0.2651 |
mmlu_virology | acc_stderr,none | 0.0344 |
mmlu_world_religions | acc,none | 0.2632 |
mmlu_world_religions | acc_stderr,none | 0.0338 |
truthfulqa_mc2 | acc,none | 0.4355 |
truthfulqa_mc2 | acc_stderr,none | 0.0155 |
winogrande | acc,none | 0.5264 |
winogrande | acc_stderr,none | 0.0140 |
- Downloads last month
- 9
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support