|
--- |
|
license: mit |
|
datasets: |
|
- HuggingFaceFW/fineweb-edu |
|
--- |
|
|
|
|
|
# SparseModernBERT α=1.5 Model Card |
|
|
|
## Model Overview |
|
|
|
SparseModernBERT-alpha1.5 is a masked language model based on [ModernBERT](https://github.com/AnswerDotAI/ModernBERT) that replaces the standard softmax attention with an adaptive sparse attention mechanism (AdaSplash) using Triton. |
|
|
|
The sparsity parameter α = 1.5 yields moderately sparse attention patterns, improving efficiency while maintaining performance. |
|
|
|
**Key features:** |
|
|
|
* **Sparsity (α)**: 1.5 |
|
* **Tokenization**: same as ModernBERT |
|
* **Pretraining**: masked language modeling on a large web corpus |
|
|
|
## Usage |
|
|
|
|
|
Use the codebase from: https://github.com/deep-spin/SparseModernBERT |
|
|
|
```python |
|
from transformers import AutoTokenizer |
|
from sparse_modern_bert import CustomModernBertModel |
|
|
|
model_id = "sardinelab/SparseModernBERT-alpha1.5" |
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
model = CustomModernBertModel.from_pretrained(model_id, trust_remote_code=True) |
|
``` |
|
|
|
## Citation |
|
|
|
If you use this model in your work, please cite: |
|
|
|
```bibtex |
|
@article{goncalves2025adasplash, |
|
title={AdaSplash: Adaptive Sparse Flash Attention}, |
|
author={Gon\c{c}alves, Nuno and Treviso, Marcos and Martins, Andr\'e F. T.}, |
|
journal={arXiv preprint arXiv:2502.12082}, |
|
year={2025} |
|
} |
|
``` |
|
|
|
|