---
license: apache-2.0
language:
- en
pipeline_tag: fill-mask
---
# Model Card for **Astro-HEP-BERT**
**Astro-HEP-BERT** is a bidirectional transformer designed primarily to generate contextualized word embeddings for analyzing epistemic change in astrophysics and high-energy physics (NEPI project at TU Berlin). Built upon Google's "bert-base-uncased," the model underwent additional training for three epochs using approximately 21.5 million paragraphs extracted from around 600,000 scholarly articles sourced from arXiv, all pertaining to astrophysics and/or high-energy physics (HEP). The sole training objective was masked language modeling.
For further insights into the model and the corpus, please refer to the Astro-HEP-BERT paper [link coming soon].
## Model Details
- **Developer:** Arno Simons
- **Funded by:** European Research Council (ERC) under Grant agreement ID: 101044932
- **Language (NLP):** English
- **License:** apache-2.0
- **Parent model:** Google's "bert-base-uncased"