---
library_name: transformers
license: apache-2.0
language:
- en
- fr
- de
- es
- zh
- it
- ru
- pl
- pt
- ja
- vi
- nl
- ar
- tr
- hi
pipeline_tag: fill-mask
tags:
- code
---
# EuroBERT-210m
## Table of Contents
1. [Overview](#overview)
2. [Usage](#Usage)
3. [Evaluation](#Evaluation)
4. [License](#license)
5. [Citation](#citation)
## Overview
EuroBERT is a family of the most powerful multilingual encoder foundation models designed for a variety of tasks—such as classification, retrieval, or evaluation metrics—across 15 languages, including mathematics and code, and supports sequences of up to 8,192 tokens.
EuroBERT incorporates recent architectural advances from decoder models (such as RoPE and Flash Attention) and is trained on a 5T-token multilingual dataset covering 8 European languages and 7 widely spoken global languages, along with mathematics and code. EuroBERT models exhibit the strongest multilingual performance across domains and tasks compared to similarly sized systems.
It is available in the following sizes:
- [EuroBERT-210m](https://huggingface.co/EuroBERT/EuroBERT-210m) - 12 layers, 210 million parameters
- [EuroBERT-610m](https://huggingface.co/EuroBERT/EuroBERT-610m) - 26 layers, 610 million parameters
- [EuroBERT-2.1B](https://huggingface.co/EuroBERT/EuroBERT-2.1B) - 32 layers, 2.1 billion parameters
For more information about EuroBERT, please refer to the [release blog post](https://huggingface.co/blog/modernbert) for a high-level overview and our [arXiv pre-print](https://arxiv.org/abs/2412.13663) for in-depth information.
## Usage
```python
from transformers import AutoTokenizer, AutoModelForMaskedLM
model_id = "EuroBERT/EuroBERT-210m"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForMaskedLM.from_pretrained(model_id, trust_remote_code=True)
text = "The capital of France is <|mask|>."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
# To get predictions for the mask:
masked_index = inputs["input_ids"][0].tolist().index(tokenizer.mask_token_id)
predicted_token_id = outputs.logits[0, masked_index].argmax(axis=-1)
predicted_token = tokenizer.decode(predicted_token_id)
print("Predicted token:", predicted_token)
# Predicted token: Paris
```
**💻 You can use these models directly with the transformers library starting from v4.48.0:**
```sh
pip install -U transformers>=4.48.0
```
**🏎️ If your GPU supports it, we recommend using EuroBERT with Flash Attention 2 to achieve the highest efficiency. To do so, install Flash Attention 2 as follows, then use the model as normal:**
```bash
pip install flash-attn
```
## Evaluation
We evaluate EuroBERT on a suite of tasks to cover various real-world use cases for multilingual encoders, including retrieval performance, classification, sequence regression, quality estimation, summary evaluation, code-related tasks, and mathematical tasks.
**Key highlights:**
The EuroBERT family exhibits strong multilingual performance across domains and tasks.
- EuroBERT-2.1B, our largest model, achieves the highest performance among all evaluated systems. It outperforms the largest system, XLM-RoBERTa-XL.
- EuroBERT-610m is competitive with XLM-RoBERTa-XL, a model 5 times its size, on most multilingual tasks and surpasses it in code and mathematics tasks.
- The smaller EuroBERT-210m generally outperforms all similarly sized systems.
## License
We release the EuroBERT model architectures, model weights, and training codebase under the Apache 2.0 license.
## Citation
If you use EuroBERT in your work, please cite:
```
SOON
```