license: apache-2.0
RakutenAI-2.0-8x7B
Model Description
RakutenAI-2.0-8x7B is an MoE-based foundation model derived from RakutenAI-7B, first introduced in March 2024. As part of a broader initiative to advance Japanese LLM technology, RakutenAI-2.0-8x7B adopts a Mixture of Experts (MoE) architecture with two active experts, resulting in 13B active parameters. This design enables dynamic expert selection based on input tokens, enhancing computational efficiency while maintaining high performance. RakutenAI-2.0-8x7B achieves state-of-the-art results on Japanese language understanding benchmarks while also demonstrating competitive performance on English evaluation tasks compared to similar models, including Swallow-MX-8x7B-NVE-0.1, Llama-3-Swallow-70B-v0.1, Sarashina2-70B, and PLaMo 100B.
If you are looking for an instruction-tuned model, check RakutenAI-2.0-8x7B-instruct.
Model Evaluation Results
Foundation Model Name | Japanese Score | English Score | Average |
---|---|---|---|
Rakuten/RakutenAI-7B | 62.93 | 34.86 | 48.90 |
Rakuten/RakutenAI-2.0-8x7B | 72.29 | 41.32 | 56.80 |
Tokyotech/Swallow-MX-8x7B-NVE-0.1 | 66.17 | 44.33 | 55.25 |
Tokyotech/Llama-3-Swallow-70B-v0.1 | 68.15 | 51.52 | 59.84 |
SBIntuitions/Sarashina2-70B | 71.09 | 39.22 | 55.16 |
PreferredNetworks/PLaMo 100B | 71.45 | 36.48 | 53.96 |
Detailed scores are as follows:
Metric | jcommonsense_qa | jnli | marc_ja | jsquad | jaqket_v2 | xlsum_ja | xwinograd | mgsm | arc_challenge | hellaswag | mmlu | truthfulqa_mc2 | gsm8k | winogrande | musr | math_hard | gpqa | bbh | ifeval | mmlu_pro |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Model Name | accuracy-3shot | accuracy-3shot | accuracy-3shot | exact_match-2shot | exact_match-1shot | rouge2-1shot | accuracy-0shot | accuracy-5shot | accuracy_norm-25shot | accuracy_norm-10shot | accuracy-5shot | accuracy-0shot | exact_match-5shot | accuracy-5shot | accuracy_norm-0shot | exact_match-4shot | accuracy_norm-0shot | accuracy_norm-3shot | avg_inst_prompt_strict_acc-0shot | accuracy-5shot |
RakutenAI-7B | 85.88 | 56.61 | 96.52 | 69.56 | 81.44 | 15.69 | 74.14 | 23.60 | 60.75 | 82.26 | 59.83 | 38.33 | 32.6 | 77.43 | 4.93 | 2.16 | 5.02 | 20.34 | 14.04 | 20.57 |
RakutenAI-2.0-8x7B | 93.12 | 87.43 | 97.72 | 74.49 | 86.00 | 15.70 | 78.62 | 45.20 | 66.38 | 85.84 | 65.50 | 48.19 | 51.40 | 80.51 | 13.88 | 3.30 | 5.71 | 27.02 | 22.90 | 25.22 |
Swallow-MX-8x7B-NVE-0.1 | 89.28 | 43.06 | 97.15 | 76.29 | 87.37 | 17.09 | 82.69 | 40.40 | 65.87 | 85.13 | 69.48 | 50.38 | 58.45 | 82.87 | 8.78 | 7.50 | 13.33 | 29.41 | 28.38 | 32.32 |
Llama-3-Swallow-70B-v0.1 | 92.58 | 66.15 | 93.46 | 70.94 | 71.74 | 12.58 | 83.32 | 54.40 | 67.58 | 87.53 | 77.47 | 55.29 | 81.50 | 85.16 | 22.05 | 13.92 | 16.60 | 49.53 | 20.91 | 40.70 |
Sarashina2-70B | 95.35 | 60.44 | 94.50 | 76.90 | 88.49 | 18.24 | 80.81 | 54.00 | 62.63 | 83.23 | 63.10 | 48.68 | 24.49 | 79.95 | 13.52 | 5.29 | 5.54 | 29.73 | 30.32 | 24.13 |
PLaMo 100B | 92.05 | 68.82 | 97.49 | 78.01 | 89.43 | 20.38 | 81.02 | 44.40 | 49.91 | 80.98 | 55.17 | 44.91 | 56.10 | 71.35 | 6.67 | 0.00 | 4.00 | 23.99 | 23.39 | 21.31 |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_path = "Rakuten/RakutenAI-2.0-8x7B"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype="auto", device_map="auto")
model.eval()
requests = [
"南硫黄島原生自然環境保全地域は、自然",
"The capybara is a giant cavy rodent",
]
for req in requests:
input_text = tokenizer(req, return_tensors="pt").to(device=model.device)
tokens = model.generate(
**input_text,
max_new_tokens=512,
do_sample=True,
pad_token_id=tokenizer.eos_token_id,
)
out = tokenizer.decode(tokens[0], skip_special_tokens=True)
print("INPUT:\n" + req)
print("OUTPUT:\n" + out)
Note on Evaluation Scores:
- Evaluation tests were carried out on LM Evaluation Harness during October - December 2024. We use default task definitions from the following commit: https://github.com/EleutherAI/lm-evaluation-harness/commit/26f607f5432e1d09c55b25488c43523e7ecde657
- The tasks considered for Japanese evaluations are listed here: https://github.com/EleutherAI/lm-evaluation-harness/blob/26f607f5432e1d09c55b25488c43523e7ecde657/lm_eval/tasks/japanese_leaderboard/README.md
- The tasks considered for English evaluations are listed here: https://huggingface.co/docs/leaderboards/en/open_llm_leaderboard/archive https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/leaderboard/README.md
Model Details
- Developed by: Rakuten Group, Inc.
- Language(s): Japanese, English
- License: This model is licensed under Apache License, Version 2.0.
- Model Architecture: Mixture of Experts (2 active experts)
Limitations and Bias
The suite of RakutenAI-2.0 models is capable of generating human-like text on a wide range of topics. However, like all LLMs, they have limitations and can produce biased, inaccurate, or unsafe outputs. Please exercise caution and judgement while interacting with them.
Citation
For citing our work on the suite of RakutenAI-2.0 models, please use:
@misc{rakutengroup2025rakutenai2.0,
author = {Rakuten Group, Inc.},
title = {RakutenAI-2.0},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/Rakuten},
}
- Downloads last month
- 76
Model tree for Rakuten/RakutenAI-2.0-8x7B
Base model
mistralai/Mistral-7B-v0.1