|
|
--- |
|
|
license: mit |
|
|
datasets: |
|
|
- nyu-mll/multi_nli |
|
|
- stanfordnlp/snli |
|
|
language: |
|
|
- en |
|
|
metrics: |
|
|
- accuracy |
|
|
base_model: |
|
|
- answerdotai/ModernBERT-base |
|
|
- tasksource/ModernBERT-base-nli |
|
|
pipeline_tag: text-classification |
|
|
library_name: sentence-transformers |
|
|
tags: |
|
|
- cross-encoder |
|
|
- modernbert |
|
|
- mnli |
|
|
- snli |
|
|
--- |
|
|
# ModernBERT Cross-Encoder: Natural Language Inference (NLI) |
|
|
|
|
|
This cross encoder performs sequence classification for contradiction/neutral/entailment labels. This has |
|
|
drop-in compatibility with comparable sentence transformers cross encoders. |
|
|
|
|
|
I trained this model by initializaing the ModernBERT-base weights from the brilliant `tasksource/ModernBERT-base-nli` |
|
|
zero-shot classification model. Then I trained it with a batch size of 64 using the `sentence-transformers` AllNLI |
|
|
dataset. |
|
|
|
|
|
--- |
|
|
|
|
|
## Features |
|
|
- **High performing:** Achieves 90.34% and 90.25% on MNLI mismatched and SNLI test. |
|
|
- **Efficient architecture:** Based on the ModernBERT-base design (149M parameters), offering faster inference speeds. |
|
|
- **Extended context length:** Processes sequences up to 8192 tokens, great for LLM output evals. |
|
|
|
|
|
--- |
|
|
|
|
|
# NLI Evaluation Results |
|
|
|
|
|
F1-Micro scores (equivalent to accuracy) for each dataset. |
|
|
|
|
|
| Model | finecat | mnli | mnli_mismatched | snli | anli_r1 | anli_r2 | anli_r3 | wanli | lingnli | |
|
|
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | |
|
|
| `dleemiller/finecat-nli-l` | **0.8152** | **0.9088** | <u>0.9217</u> | <u>0.9259</u> | **0.7400** | **0.5230** | **0.5150** | **0.7424** | **0.8689** | |
|
|
| `tasksource/ModernBERT-large-nli` | 0.7959 | 0.8983 | **0.9229** | 0.9188 | <u>0.7260</u> | <u>0.5110</u> | </u>0.4925</u> | <u>0.6978</u> | 0.8504 | |
|
|
| `dleemiller/ModernCE-large-nli` | 0.7811 | **0.9088** | 0.9205 | **0.9273** | 0.6630 | 0.4860 | 0.4408 | 0.6576 | <u>0.8566</u> | |
|
|
| `tasksource/ModernBERT-base-nli` | 0.7595 | 0.8685 | 0.8979 | 0.8915 | 0.6300 | 0.4820 | 0.4192 | 0.6632 | 0.8118 | |
|
|
| `dleemiller/ModernCE-base-nli` | 0.7533 | 0.8923 | 0.9035 | 0.9187 | 0.5240 | 0.3950 | 0.3333 | 0.6464 | 0.8282 | |
|
|
| `dleemiller/EttinX-nli-s` | 0.7251 | 0.8765 | 0.8798 | 0.9128 | 0.3360 | 0.2790 | 0.3083 | 0.6234 | 0.8012 | |
|
|
| `dleemiller/EttinX-nli-xs` | 0.7013 | 0.8376 | 0.8380 | 0.8979 | 0.2780 | 0.2840 | 0.2800 | 0.5838 | 0.7521 | |
|
|
| `dleemiller/EttinX-nli-xxs` | 0.6842 | 0.7988 | 0.8047 | 0.8851 | 0.2590 | 0.3060 | 0.2992 | 0.5426 | 0.7018 | |
|
|
|
|
|
|
|
|
--- |
|
|
|
|
|
## Usage |
|
|
|
|
|
To use ModernCE for NLI tasks, you can load the model with the Hugging Face `sentence-transformers` library: |
|
|
|
|
|
```python |
|
|
from sentence_transformers import CrossEncoder |
|
|
|
|
|
# Load ModernCE model |
|
|
model = CrossEncoder("dleemiller/ModernCE-base-nli") |
|
|
|
|
|
scores = model.predict([ |
|
|
('A man is eating pizza', 'A man eats something'), |
|
|
('A black race car starts up in front of a crowd of people.', 'A man is driving down a lonely road.') |
|
|
]) |
|
|
|
|
|
# Convert scores to labels |
|
|
label_mapping = ['contradiction', 'entailment', 'neutral'] |
|
|
labels = [label_mapping[score_max] for score_max in scores.argmax(axis=1)] |
|
|
# ['entailment', 'contradiction'] |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Pretraining |
|
|
We initialize the `tasksource/ModernBERT-base` weights. |
|
|
|
|
|
Details: |
|
|
- Batch size: 64 |
|
|
- Learning rate: 3e-4 |
|
|
- **Attention Dropout:** attention dropout 0.1 |
|
|
|
|
|
### Fine-Tuning |
|
|
Fine-tuning was performed on the SBERT AllNLI.tsv.gz dataset. |
|
|
|
|
|
### Validation Results |
|
|
The model achieved the following test set performance after fine-tuning: |
|
|
- **MNLI Unmatched:** 0.9034 |
|
|
- **SNLI:** 0.9025 |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Card |
|
|
|
|
|
- **Architecture:** ModernBERT-base |
|
|
- **Fine-Tuning Data:** `sentence-transformers` - AllNLI.tsv.gz |
|
|
|
|
|
--- |
|
|
|
|
|
## Thank You |
|
|
|
|
|
Thanks to the AnswerAI team for providing the ModernBERT models, and the Sentence Transformers team for their leadership in transformer encoder models. |
|
|
We also thank the tasksource team for their work on zeroshot encoder models. |
|
|
|
|
|
--- |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model in your research, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{moderncenli2025, |
|
|
author = {Miller, D. Lee}, |
|
|
title = {ModernCE NLI: An NLI cross encoder model}, |
|
|
year = {2025}, |
|
|
publisher = {Hugging Face Hub}, |
|
|
url = {https://huggingface.co/dleemiller/ModernCE-base-nli}, |
|
|
} |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## License |
|
|
|
|
|
This model is licensed under the [MIT License](LICENSE). |