|
--- |
|
license: mit |
|
pipeline_tag: token-classification |
|
--- |
|
|
|
# Description: |
|
|
|
climateattention-ctw classifies if a given sequence is related to climate topics. |
|
As a fine-tuned classifier based on climatebert/distilroberta-base-climate-f (Webersinke et al., 2021), |
|
it is using the following ClimaText dataset (Varini et al., 2020): |
|
|
|
* Wiki-doc corpus, with 115847 samples (57922 positives, 57925 negatives) |
|
|
|
For company disclosures or news articles you might want to check the 10k model: kruthof/climateattention-10k-upscaled |
|
# How to use: |
|
```python |
|
from transformers import AutoTokenizer, pipeline,RobertaForSequenceClassification |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("climatebert/distilroberta-base-climate-f") |
|
climateattention = RobertaForSequenceClassification.from_pretrained('kruthof/climateattention-ctw',num_labels=2) |
|
|
|
ClimateAttention = pipeline("text-classification", model=climateattention, tokenizer=tokenizer) |
|
|
|
ClimateAttention('Emissions have increased during the last several months') |
|
|
|
>> [{'label': 'Yes', 'score': 0.9993829727172852}] |
|
|
|
``` |
|
# Performance: |
|
|
|
Performance tested on the balanced ClimaText Wiki-doc test set, featuring 3826 samples (1913 positives, 1913 negatives) (Varini et al., 2020) |
|
|
|
|Accuracy| Precision | Recall | F1 | |
|
|----|-----|-----|-----| |
|
| 0.8834 | 0.8717 | 0.8991 | 0.8852 | |
|
|
|
|
|
|
|
``` |
|
|
|
# References: |
|
|
|
Varini, F. S., Boyd-Graber, J., Ciaramita, M., & Leippold, M. (2020). |
|
ClimaText: A dataset for climate change topic detection. arXiv preprint arXiv:2012.00483. |
|
|
|
Webersinke, N., Kraus, M., Bingler, J. A., & Leippold, M. (2021). |
|
Climatebert: A pretrained language model for climate-related text. arXiv preprint arXiv:2110.12010. |
|
|
|
------------------------------ |
|
https://kruthof.github.io |
|
|