File size: 5,055 Bytes
3af7c8f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f82a1ec
3af7c8f
 
 
 
f9cdb88
3af7c8f
 
a1f999c
3af7c8f
 
 
 
 
 
 
 
 
 
 
 
afa1ad4
3af7c8f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a2335bd
 
 
 
 
 
 
 
 
3af7c8f
c872e73
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
---
license: other
license_name: seallm
license_link: https://huggingface.co/SeaLLMs/SeaLLM-13B-Chat/blob/main/LICENSE
language:
- en
- zh
- hi
- es
- fr
- ar
- bn
- ru
- pt
- id
- ur
- de
- ja
- sw
- ta
- tr
- ko
- vi
- jv
- it
- ha
- th
- fa
- tl
- my
tags:
- multilingual
- babel
---


# *Babel*: Open Multilingual Large Language Models Serving Over 90% of Global Speakers

<p align="center">
<a href="https://babel-llm.github.io/babel-llm/" target="_blank" rel="noopener">Website</a>
&nbsp;&nbsp;
<a href="https://huggingface.co/Tower-Babel/Babel-9B/" target="_blank" rel="noopener">Model</a>
&nbsp;&nbsp;
<a href="https://github.com/babel-llm/babel-llm" target="_blank" rel="noopener">Github</a>
&nbsp;&nbsp;
<a href="https://arxiv.org/pdf/2503.00865" target="_blank" rel="noopener">Technical Report</a>
</p>

## Introduction

We introduce **Babel**, a multilingual LLM that covers the top 25 languages by number of speakers, including English, Chinese, Hindi, Spanish, Arabic, French, Bengali, Portuguese, Russian, Urdu, Indonesian, German, Japanese, Swahili, Filipino, Tamil, Vietnamese, Turkish, Italian, Javanese, Korean, Hausa, Persian, Thai, and Burmese. These 25 languages support over 90% of the global population, and include many languages neglected by other open multilingual LLMs. Unlike traditional continued pretraining approaches, Babel expands its parameter count through a layer extension technique that elevates Babel's performance ceiling.

We introduce two variants:  
- **Babel-9B**, designed for efficient inference and fine-tuning  
- **Babel-83B**, which sets a new standard for open multilingual LLMs  

Extensive evaluations on multilingual tasks demonstrate its superior performance compared to open LLMs of comparable size. In addition, using existing supervised fine-tuning datasets, Babel achieves remarkable performance, with **Babel-9B-Chat** leading among 10B-sized LLMs and **Babel-83B-Chat** setting a new standard for open LLMs, performing comparably to GPT-4o on certain tasks.

This page introduces the **Babel-9B-Base** model


## Evaluation

We employ multilingual tasks across several categories:

1. **World Knowledge:**  
   MMMLU ([OpenAI 2024](https://huggingface.co/datasets/openai/MMMLU)), a human-translated version of MMLU ([Hendrycks et al. 2021](https://arxiv.org/abs/2009.03300)) available in 14 languages. For languages not covered, we use Google Translate ([Google Translate API](https://translate.google.com/)) to generate translations. Additionally, we include M3Exam ([M3Exam](https://arxiv.org/abs/2306.05179)), which consists of authentic human exam questions collected from various countries, covering multiple subjects and educational levels.

2. **Reasoning:**  
   MGSM ([Shi et al. 2022](https://arxiv.org/abs/2210.03057)) and XCOPA ([Ponti et al. 2020](https://aclanthology.org/2020.emnlp-main.185/)).

3. **Understanding:**  
   XNLI ([Conneau et al. 2018](https://arxiv.org/abs/1809.05053)).

4. **Translation:**  
   Flores-200 ([NLLB Team 2022](https://arxiv.org/abs/2207.04672)).


### Performance of 10B-Size Base Models vs. Babel-9B

| Dataset     | Gemma2-9B | Mistral-12B | Llama3.1-8B | Qwen2.5-7B | GLM4-9B | **Babel-9B** |
|------------|-----------|-------------|-------------|------------|---------|-------------|
| MMMLU      | **59.8**  | 52.8        | 49.4        | 56.7       | 55.6    | 59.4        |
| M3Exam     | **61.6**  | 54.2        | 52.5        | 58.8       | 56.6    | 61.3        |
| XCOPA      | 84.6      | 81.3        | 75.9        | 81.1       | 87.3    | **89.2**    |
| MGSM       | 34.3      | 26.0        | 18.0        | 41.1       | 39.0    | **43.4**    |
| XNLI       | 61.7      | 55.0        | 48.9        | 70.3       | 69.9    | **71.9**    |
| Flores-200 | 53.2      | 50.8        | 50.9        | 45.5       | 46.6    | **55.1**    |
| *Average*  | 59.5      | 53.4        | 49.3        | 58.9       | 59.2    | **63.4**    |


## Acknowledgement
We would like to thank Guanzheng Chen for assisting with the implementation of the training codebase. Our special thanks go to our professional and native linguists—Tantong Champaiboon, Nguyen Ngoc Yen Nhi, and Tara Devina Putri—who contributed to building, evaluating, and fact-checking our sampled pretraining dataset. We also appreciate Fan Wang, Jiasheng Tang, Xin Li, and Hao Zhang for their efforts in coordinating computing resources.

## Citation

If you find our project useful, we hope you would kindly star our repo and cite our work as follows: 
```
@misc{zhao2025babelopenmultilinguallarge,
      title={Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers}, 
      author={Yiran Zhao and Chaoqun Liu and Yue Deng and Jiahao Ying and Mahani Aljunied and Zhaodonghui Li and Lidong Bing and Hou Pong Chan and Yu Rong and Deli Zhao and Wenxuan Zhang},
      year={2025},
      eprint={2503.00865},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2503.00865}, 
}
```
Corresponding Author: [email protected]