File size: 1,715 Bytes
e80c110 6b9dd8b e80c110 6b9dd8b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
---
license: mit
pipeline_tag: token-classification
---
# Description:
climateattention-10k classifies if a given sequence is related to climate topics.
As a fine-tuned classifier based on climatebert/distilroberta-base-climate-f (Webersinke et al., 2021),
it is using the following ClimaText dataset (Varini et al., 2020):
* AL-10Ks.tsv : 3000 (58 positives, 2942 negatives)
The training set is highly unbalanced. You might want to check the upscaling version: 'kruthof/climateattention-10k-upscaled'
# How to use:
```python
from transformers import AutoTokenizer, pipeline,RobertaForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("climatebert/distilroberta-base-climate-f")
climateattention = RobertaForSequenceClassification.from_pretrained('kruthof/climateattention-10k',num_labels=2)
ClimateAttention = pipeline("text-classification", model=climateattention, tokenizer=tokenizer)
ClimateAttention('Emissions have increased during the last several months')
>> [{'label': 'Yes', 'score': 0.9993829727172852}]
```
# Performance:
Performance tested on the balanced ClimaText 10K test set, featuring 300 samples (67 positives, 233 negatives) (Varini et al., 2020)
|Accuracy| Precision | Recall | F1 |
|----|-----|-----|-----|
| 0.9633 | 1 | 0.8358 | 0.9106 |
# References:
Varini, F. S., Boyd-Graber, J., Ciaramita, M., & Leippold, M. (2020).
ClimaText: A dataset for climate change topic detection. arXiv preprint arXiv:2012.00483.
Webersinke, N., Kraus, M., Bingler, J. A., & Leippold, M. (2021).
Climatebert: A pretrained language model for climate-related text. arXiv preprint arXiv:2110.12010.
------------------------------
https://kruthof.github.io
|