File size: 4,499 Bytes
73c0b68 98ca924 73c0b68 75a9cf3 73c0b68 75a9cf3 b5e8079 75a9cf3 b5e8079 75a9cf3 73c0b68 a4fe667 73c0b68 6ace2d9 73c0b68 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
---
language: en
license: mit
library_name: transformers
tags:
- climate-change
- domain-adaptation
- masked-language-modeling
- scientific-nlp
- transformer
- BERT
- ClimateBERT
metrics:
- f1
model-index:
- name: SciClimateBERT
results:
- task:
type: text-classification
name: Climate NLP Tasks (ClimaBench)
dataset:
name: ClimaBench
type: benchmark
metrics:
- type: f1
name: Macro F1 (avg)
value: 57.829
---
# SciClimateBERT 🌎🔬
**SciClimateBERT** is a domain-adapted version of [**ClimateBERT**](https://huggingface.co/climatebert/distilroberta-base-climate-f), further pretrained on peer-reviewed scientific papers focused on climate change. While ClimateBERT is tuned for general climate-related text, SciClimateBERT narrows the focus to high-quality academic content, improving performance in scientific NLP applications.
## 🔍 Overview
- **Base Model**: ClimateBERT (RoBERTa-based architecture)
- **Pretraining Method**: Continued pretraining (domain adaptation) with Masked Language Modeling (MLM)
- **Corpus**: Scientific climate change literature from top-tier journals
- **Tokenizer**: ClimateBERT tokenizer (unchanged)
- **Language**: English
- **Domain**: Scientific climate change research
## 📊 Performance
Evaluated on **ClimaBench**, a benchmark suite for climate-focused NLP tasks:
| Metric | Value |
|----------------|--------------|
| Macro F1 (avg) | 57.83|
| Tasks won | 0/7 |
| Avg. Std Dev | 0.01747|
While based on ClimateBERT, this model focuses on structured scientific input, making it ideal for downstream applications in climate science and research automation.
Climate performance model card:
|SciClimateBERT||
|---------------------------------|-----------------------------|
| 1. Model publicly available? | Yes |
| 2. Time to train final model |300h |
| 3. Time for all experiments | 1,226h ~ 51 days |
| 4. Power of GPU and CPU | 0.250 kW + 0.013 kW |
| 5. Location for computations | Croatia |
| 6. Energy mix at location | 224.71 gCO<sub>2</sub>eq/kWh |
| 7. CO$_2$eq for final model | 18 kg CO<sub>2</sub> |
| 8. CO$_2$eq for all experiments | 74 kg CO<sub>2</sub> |
## 🧪 Intended Uses
**Use for:**
- Scientific climate change text classification and extraction
- Knowledge base and graph construction in climate policy and research domains
**Not suitable for:**
- Non-scientific general-purpose text
- Multilingual applications
Example:
``` python
from transformers import AutoTokenizer, AutoModelForMaskedLM, pipeline
import torch
# Load the pretrained model and tokenizer
model_name = "P0L3/clirebert_clirevocab_uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForMaskedLM.from_pretrained(model_name)
# Move model to GPU if available
device = 0 if torch.cuda.is_available() else -1
# Create a fill-mask pipeline
fill_mask = pipeline("fill-mask", model=model, tokenizer=tokenizer, device=device)
# Example input from scientific climate literature
text = "The increase in greenhouse gas emissions has significantly affected the <mask> balance of the Earth."
# Run prediction
predictions = fill_mask(text)
# Show top predictions
print(text)
print(10*">")
for p in predictions:
print(f"{p['sequence']} — {p['score']:.4f}")
```
Output:
``` shell
The increase in greenhouse gas emissions has significantly affected the <mask> balance of the Earth.
>>>>>>>>>>
The increase in greenhouse gas ... affected the energy balance of the Earth. — 0.7897
The increase in greenhouse gas ... affected the radiation balance of the Earth. — 0.0522
The increase in greenhouse gas ... affected the mass balance of the Earth. — 0.0401
The increase in greenhouse gas ... affected the water balance of the Earth. — 0.0359
The increase in greenhouse gas ... affected the carbon balance of the Earth. — 0.0190
```
## ⚠️ Limitations
- May reflect scientific publication biases
## 🧾 Citation
If you use this model, please cite:
```bibtex
@article{poleksic_etal_2025,
title={Climate Research Domain BERTs: Pretraining, Adaptation, and Evaluation},
author={Poleksić, Andrija and
Martinčić-Ipšić, Sanda},
journal={PREPRINT (Version 1)},
year={2025},
doi={https://doi.org/10.21203/rs.3.rs-6644722/v1}
}
|