File size: 1,425 Bytes
1b87a76
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
84ad9ce
1b87a76
 
84ad9ce
1b87a76
 
 
 
327d7b5
1b87a76
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
---
language: en
datasets:
- squad_v2
model-index:
- name: kiddothe2b/ModernBERT-base-squad2
  results:
  - task:
      type: question-answering
      name: Question Answering
    dataset:
      name: squad_v2
      type: squad_v2
      config: squad_v2
      split: validation
    metrics:
    - type: exact_match
      value: 81.2936
      name: Exact Match
    - type: f1
      value: 84.4849
      name: F1
base_model:
- answerdotai/ModernBERT-base
pipeline_tag: question-answering
library_name: transformers
---

# ModernBERT-base for Extractive QA 

This is a single-model solution for SQuAD-like QA based on ModernBERT (Warner et al., 2024). ModernBERT is an up-to-date drop-in replacement for BERT-like Language Models. It is an Encoder-only, Pre-Norm Transformer with GeGLU activations pre-trained with Masked Language Modeling (MLM) on sequences of up to 1,024 tokens on a corpus of 2 trillion tokens of English text and code. ModernBERT adopted many recent best practices, i.e., increased masked rating, pre-normalization, no bias terms, etc, and it also seems to have the best performance in NLU tasks among base-sized encoder-only models, like BERT, RoBERTa, DeBERTa, etc. The available implementation of ModernBERT also utilizes Flash Attention, which makes it substantially faster compared to the outdated implementations of the rest, e.g., ModernBERT-base seems to run 3-4x faster compared to DeBERTa-V3-base.