--- license: mit datasets: - HuggingFaceFW/fineweb-edu --- # SparseModernBERT α=1.5 Model Card ## Model Overview SparseModernBERT-alpha1.5 is a masked language model based on [ModernBERT](https://github.com/AnswerDotAI/ModernBERT) that replaces the standard softmax attention with an adaptive sparse attention mechanism (AdaSplash) using Triton. The sparsity parameter α = 1.5 yields moderately sparse attention patterns, improving efficiency while maintaining performance. **Key features:** * **Sparsity (α)**: 1.5 * **Tokenization**: same as ModernBERT * **Pretraining**: masked language modeling on a large web corpus ## Usage Use the codebase from: https://github.com/deep-spin/SparseModernBERT ```python from transformers import AutoTokenizer from sparse_modern_bert import CustomModernBertModel model_id = "sardinelab/SparseModernBERT-alpha1.5" tokenizer = AutoTokenizer.from_pretrained(model_id) model = CustomModernBertModel.from_pretrained(model_id, trust_remote_code=True) ``` ## Citation If you use this model in your work, please cite: ```bibtex @article{goncalves2025adasplash, title={AdaSplash: Adaptive Sparse Flash Attention}, author={Gon\c{c}alves, Nuno and Treviso, Marcos and Martins, Andr\'e F. T.}, journal={arXiv preprint arXiv:2502.12082}, year={2025} } ```