---
license: apache-2.0
base_model:
- Rakuten/RakutenAI-7B
---
---
license: apache-2.0
---
# RakutenAI-2.0-8x7B
## Model Description
RakutenAI-2.0-8x7B is an MoE-based foundation model derived from [RakutenAI-7B](https://huggingface.co/Rakuten/RakutenAI-7B), first introduced in March 2024. As part of a broader initiative to advance Japanese LLM technology, RakutenAI-2.0-8x7B adopts a Mixture of Experts (MoE) architecture with two active experts, resulting in **13B active parameters**. This design enables dynamic expert selection based on input tokens, enhancing computational efficiency while maintaining high performance. RakutenAI-2.0-8x7B achieves state-of-the-art results on Japanese language understanding benchmarks while also demonstrating competitive performance on English evaluation tasks compared to similar models, including Swallow-MX-8x7B-NVE-0.1, Llama-3-Swallow-70B-v0.1, Sarashina2-70B, and PLaMo 100B.
*If you are looking for an instruction-tuned model, check [RakutenAI-2.0-8x7B-instruct](https://huggingface.co/Rakuten/RakutenAI-2.0-8x7B-instruct)*.
## Model Evaluation Results
| Foundation Model Name | Japanese Score | English Score | Average |
|-----------------------------------------------|---------------|--------------|---------|
| Rakuten/RakutenAI-7B | 62.93 | 34.86 | 48.90 |
| **Rakuten/RakutenAI-2.0-8x7B** | **72.29** | 41.32 | 56.80 |
| Tokyotech/Swallow-MX-8x7B-NVE-0.1 | 66.17 | 44.33 | 55.25 |
| Tokyotech/Llama-3-Swallow-70B-v0.1 | 68.15 | **51.52** | **59.84** |
| SBIntuitions/Sarashina2-70B | 71.09 | 39.22 | 55.16 |
| PreferredNetworks/PLaMo 100B | 71.45 | 36.48 | 53.96 |
Table1: RakutenAI-2.0-8x7B foundation model average performance scores on LM-Harness in comparison with other Japanese open models.
Detailed scores are as follows:
| Metric | jcommonsense_qa | jnli | marc_ja | jsquad | jaqket_v2 | xlsum_ja | xwinograd | mgsm | arc_challenge | hellaswag | mmlu | truthfulqa_mc2 | gsm8k | winogrande | musr | math_hard | gpqa | bbh | ifeval | mmlu_pro |
|----------------------|-----------------|-------|---------|--------|-----------|----------|-----------|-------|---------------|-----------|-------|----------------|-------|------------|-------|-----------|-------|-------|--------|----------|
| **Model Name** | accuracy-3shot | accuracy-3shot | accuracy-3shot | exact_match-2shot | exact_match-1shot | rouge2-1shot | accuracy-0shot | accuracy-5shot | accuracy_norm-25shot | accuracy_norm-10shot | accuracy-5shot | accuracy-0shot | exact_match-5shot | accuracy-5shot | accuracy_norm-0shot | exact_match-4shot | accuracy_norm-0shot | accuracy_norm-3shot | avg_inst_prompt_strict_acc-0shot | accuracy-5shot |
| RakutenAI-7B | 85.88 | 56.61 | 96.52 | 69.56 | 81.44 | 15.69 | 74.14 | 23.60 | 60.75 | 82.26 | 59.83 | 38.33 | 32.6 | 77.43 | 4.93 | 2.16 | 5.02 | 20.34 | 14.04 | 20.57 |
| RakutenAI-2.0-8x7B | 93.12 | 87.43 | 97.72 | 74.49 | 86.00 | 15.70 | 78.62 | 45.20 | 66.38 | 85.84 | 65.50 | 48.19 | 51.40 | 80.51 | 13.88 | 3.30 | 5.71 | 27.02 | 22.90 | 25.22 |
| Swallow-MX-8x7B-NVE-0.1 | 89.28 | 43.06 | 97.15 | 76.29 | 87.37 | 17.09 | 82.69 | 40.40 | 65.87 | 85.13 | 69.48 | 50.38 | 58.45 | 82.87 | 8.78 | 7.50 | 13.33 | 29.41 | 28.38 | 32.32 |
| Llama-3-Swallow-70B-v0.1 | 92.58 | 66.15 | 93.46 | 70.94 | 71.74 | 12.58 | 83.32 | 54.40 | 67.58 | 87.53 | 77.47 | 55.29 | 81.50 | 85.16 | 22.05 | 13.92 | 16.60 | 49.53 | 20.91 | 40.70 |
| Sarashina2-70B | 95.35 | 60.44 | 94.50 | 76.90 | 88.49 | 18.24 | 80.81 | 54.00 | 62.63 | 83.23 | 63.10 | 48.68 | 24.49 | 79.95 | 13.52 | 5.29 | 5.54 | 29.73 | 30.32 | 24.13 |
| PLaMo 100B | 92.05 | 68.82 | 97.49 | 78.01 | 89.43 | 20.38 | 81.02 | 44.40 | 49.91 | 80.98 | 55.17 | 44.91 | 56.10 | 71.35 | 6.67 | 0.00 | 4.00 | 23.99 | 23.39 | 21.31 |
Table2: RakutenAI-2.0-8x7B foundation model performance on LM-Harness metrics in comparison with other Japanese open models.
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_path = "Rakuten/RakutenAI-2.0-8x7B"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype="auto", device_map="auto")
model.eval()
requests = [
"南硫黄島原生自然環境保全地域は、自然",
"The capybara is a giant cavy rodent",
]
for req in requests:
input_text = tokenizer(req, return_tensors="pt").to(device=model.device)
tokens = model.generate(
**input_text,
max_new_tokens=512,
do_sample=True,
pad_token_id=tokenizer.eos_token_id,
)
out = tokenizer.decode(tokens[0], skip_special_tokens=True)
print("INPUT:\n" + req)
print("OUTPUT:\n" + out)
```
**Note on Evaluation Scores:**
- Evaluation tests were carried out on LM Evaluation Harness during October - December 2024. We use default task definitions from the following commit: https://github.com/EleutherAI/lm-evaluation-harness/commit/26f607f5432e1d09c55b25488c43523e7ecde657
- The tasks considered for Japanese evaluations are listed here: https://github.com/EleutherAI/lm-evaluation-harness/blob/26f607f5432e1d09c55b25488c43523e7ecde657/lm_eval/tasks/japanese_leaderboard/README.md
- The tasks considered for English evaluations are listed here: https://huggingface.co/docs/leaderboards/en/open_llm_leaderboard/archive
https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/leaderboard/README.md
## Model Details
* **Developed by**: [Rakuten Group, Inc.](https://ai.rakuten.com/)
* **Language(s)**: Japanese, English
* **License**: This model is licensed under [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).
* **Model Architecture**: Mixture of Experts (2 active experts)
### Limitations and Bias
The suite of RakutenAI-2.0 models is capable of generating human-like text on a wide range of topics. However, like all LLMs, they have limitations and can produce biased, inaccurate, or unsafe outputs. Please exercise caution and judgement while interacting with them.
## Citation
For citing our work on the suite of RakutenAI-2.0 models, please use:
```
@misc{rakutengroup2025rakutenai2.0,
author = {Rakuten Group, Inc.},
title = {RakutenAI-2.0},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/Rakuten},
}
```