--- license: apache-2.0 language: - en pipeline_tag: fill-mask --- # Model Card for **Astro-HEP-BERT** **Astro-HEP-BERT** is a bidirectional transformer designed primarily to generate contextualized word embeddings for analyzing epistemic change in astrophysics and high-energy physics (NEPI project at TU Berlin). Built upon Google's "bert-base-uncased," the model underwent additional training for three epochs using approximately 21.5 million paragraphs extracted from around 600,000 scholarly articles sourced from arXiv, all pertaining to astrophysics and/or high-energy physics (HEP). The sole training objective was masked language modeling. For further insights into the model and the corpus, please refer to the Astro-HEP-BERT paper [link coming soon]. ## Model Details - **Developer:** Arno Simons - **Funded by:** European Research Council (ERC) under Grant agreement ID: 101044932 - **Language (NLP):** English - **License:** apache-2.0 - **Parent model:** Google's "bert-base-uncased"