metadata

pipeline_tag: text-generation
library_name: transformers
license: apache-2.0
tags:
  - mixtral
  - moe
  - reasoning

Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks

This repository contains model checkpoints from the paper Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks.

For more details, including code and evaluation procedures, please refer to the official GitHub repository: https://github.com/rioyokotalab/optimal-sparsity

How to cite

If you find our work helpful, please feel free to cite the paper.

@article{nakamura2025optimalsparsitymixtureofexpertslanguage,
      title={Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks},
      author={Taishi Nakamura and Satoki Ishikawa and Masaki Kawamura and Takumi Okamoto and Daisuke Nohara and Jun Suzuki and Rio Yokota},
      year={2025},
      eprint={2508.18672},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2508.18672},
}