--- license: apache-2.0 base_model: - Rakuten/RakutenAI-7B --- --- license: apache-2.0 --- # RakutenAI-2.0-8x7B ## Model Description RakutenAI-2.0-8x7B is an MoE-based foundation model derived from [RakutenAI-7B](https://huggingface.co/Rakuten/RakutenAI-7B), first introduced in March 2024. As part of a broader initiative to advance Japanese LLM technology, RakutenAI-2.0-8x7B adopts a Mixture of Experts (MoE) architecture with two active experts, resulting in **13B active parameters**. This design enables dynamic expert selection based on input tokens, enhancing computational efficiency while maintaining high performance. RakutenAI-2.0-8x7B achieves state-of-the-art results on Japanese language understanding benchmarks while also demonstrating competitive performance on English evaluation tasks compared to similar models, including Swallow-MX-8x7B-NVE-0.1, Llama-3-Swallow-70B-v0.1, Sarashina2-70B, and PLaMo 100B. *If you are looking for an instruction-tuned model, check [RakutenAI-2.0-8x7B-instruct](https://huggingface.co/Rakuten/RakutenAI-2.0-8x7B-instruct)*. ## Model Evaluation Results | Foundation Model Name | Japanese Score | English Score | Average | |-----------------------------------------------|---------------|--------------|---------| | Rakuten/RakutenAI-7B | 62.93 | 34.86 | 48.90 | | **Rakuten/RakutenAI-2.0-8x7B** | **72.29** | 41.32 | 56.80 | | Tokyotech/Swallow-MX-8x7B-NVE-0.1 | 66.17 | 44.33 | 55.25 | | Tokyotech/Llama-3-Swallow-70B-v0.1 | 68.15 | **51.52** | **59.84** | | SBIntuitions/Sarashina2-70B | 71.09 | 39.22 | 55.16 | | PreferredNetworks/PLaMo 100B | 71.45 | 36.48 | 53.96 |
Table1: RakutenAI-2.0-8x7B foundation model average performance scores on LM-Harness in comparison with other Japanese open models.
Detailed scores are as follows: | Metric | jcommonsense_qa | jnli | marc_ja | jsquad | jaqket_v2 | xlsum_ja | xwinograd | mgsm | arc_challenge | hellaswag | mmlu | truthfulqa_mc2 | gsm8k | winogrande | musr | math_hard | gpqa | bbh | ifeval | mmlu_pro | |----------------------|-----------------|-------|---------|--------|-----------|----------|-----------|-------|---------------|-----------|-------|----------------|-------|------------|-------|-----------|-------|-------|--------|----------| | **Model Name** | accuracy-3shot | accuracy-3shot | accuracy-3shot | exact_match-2shot | exact_match-1shot | rouge2-1shot | accuracy-0shot | accuracy-5shot | accuracy_norm-25shot | accuracy_norm-10shot | accuracy-5shot | accuracy-0shot | exact_match-5shot | accuracy-5shot | accuracy_norm-0shot | exact_match-4shot | accuracy_norm-0shot | accuracy_norm-3shot | avg_inst_prompt_strict_acc-0shot | accuracy-5shot | | RakutenAI-7B | 85.88 | 56.61 | 96.52 | 69.56 | 81.44 | 15.69 | 74.14 | 23.60 | 60.75 | 82.26 | 59.83 | 38.33 | 32.6 | 77.43 | 4.93 | 2.16 | 5.02 | 20.34 | 14.04 | 20.57 | | RakutenAI-2.0-8x7B | 93.12 | 87.43 | 97.72 | 74.49 | 86.00 | 15.70 | 78.62 | 45.20 | 66.38 | 85.84 | 65.50 | 48.19 | 51.40 | 80.51 | 13.88 | 3.30 | 5.71 | 27.02 | 22.90 | 25.22 | | Swallow-MX-8x7B-NVE-0.1 | 89.28 | 43.06 | 97.15 | 76.29 | 87.37 | 17.09 | 82.69 | 40.40 | 65.87 | 85.13 | 69.48 | 50.38 | 58.45 | 82.87 | 8.78 | 7.50 | 13.33 | 29.41 | 28.38 | 32.32 | | Llama-3-Swallow-70B-v0.1 | 92.58 | 66.15 | 93.46 | 70.94 | 71.74 | 12.58 | 83.32 | 54.40 | 67.58 | 87.53 | 77.47 | 55.29 | 81.50 | 85.16 | 22.05 | 13.92 | 16.60 | 49.53 | 20.91 | 40.70 | | Sarashina2-70B | 95.35 | 60.44 | 94.50 | 76.90 | 88.49 | 18.24 | 80.81 | 54.00 | 62.63 | 83.23 | 63.10 | 48.68 | 24.49 | 79.95 | 13.52 | 5.29 | 5.54 | 29.73 | 30.32 | 24.13 | | PLaMo 100B | 92.05 | 68.82 | 97.49 | 78.01 | 89.43 | 20.38 | 81.02 | 44.40 | 49.91 | 80.98 | 55.17 | 44.91 | 56.10 | 71.35 | 6.67 | 0.00 | 4.00 | 23.99 | 23.39 | 21.31 |
Table2: RakutenAI-2.0-8x7B foundation model performance on LM-Harness metrics in comparison with other Japanese open models.
## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_path = "Rakuten/RakutenAI-2.0-8x7B" tokenizer = AutoTokenizer.from_pretrained(model_path) model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype="auto", device_map="auto") model.eval() requests = [ "南硫黄島原生自然環境保全地域は、自然", "The capybara is a giant cavy rodent", ] for req in requests: input_text = tokenizer(req, return_tensors="pt").to(device=model.device) tokens = model.generate( **input_text, max_new_tokens=512, do_sample=True, pad_token_id=tokenizer.eos_token_id, ) out = tokenizer.decode(tokens[0], skip_special_tokens=True) print("INPUT:\n" + req) print("OUTPUT:\n" + out) ``` **Note on Evaluation Scores:** - Evaluation tests were carried out on LM Evaluation Harness during October - December 2024. We use default task definitions from the following commit: https://github.com/EleutherAI/lm-evaluation-harness/commit/26f607f5432e1d09c55b25488c43523e7ecde657 - The tasks considered for Japanese evaluations are listed here: https://github.com/EleutherAI/lm-evaluation-harness/blob/26f607f5432e1d09c55b25488c43523e7ecde657/lm_eval/tasks/japanese_leaderboard/README.md - The tasks considered for English evaluations are listed here: https://huggingface.co/docs/leaderboards/en/open_llm_leaderboard/archive https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/leaderboard/README.md ## Model Details * **Developed by**: [Rakuten Group, Inc.](https://ai.rakuten.com/) * **Language(s)**: Japanese, English * **License**: This model is licensed under [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). * **Model Architecture**: Mixture of Experts (2 active experts) ### Limitations and Bias The suite of RakutenAI-2.0 models is capable of generating human-like text on a wide range of topics. However, like all LLMs, they have limitations and can produce biased, inaccurate, or unsafe outputs. Please exercise caution and judgement while interacting with them. ## Citation For citing our work on the suite of RakutenAI-2.0 models, please use: ``` @misc{rakutengroup2025rakutenai2.0, author = {Rakuten Group, Inc.}, title = {RakutenAI-2.0}, year = {2025}, publisher = {Hugging Face}, url = {https://huggingface.co/Rakuten}, } ```