File size: 1,715 Bytes
e80c110
 
6b9dd8b
e80c110
6b9dd8b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
---
license: mit
pipeline_tag: token-classification
---

# Description:

climateattention-10k classifies if a given sequence is related to climate topics. 
As a fine-tuned classifier based on climatebert/distilroberta-base-climate-f (Webersinke et al., 2021), 
it is using the following ClimaText dataset (Varini et al., 2020):

* AL-10Ks.tsv : 3000 (58 positives, 2942 negatives)

The training set is highly unbalanced. You might want to check the upscaling version: 'kruthof/climateattention-10k-upscaled'

# How to use:
```python
from transformers import AutoTokenizer, pipeline,RobertaForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("climatebert/distilroberta-base-climate-f")
climateattention = RobertaForSequenceClassification.from_pretrained('kruthof/climateattention-10k',num_labels=2)

ClimateAttention = pipeline("text-classification", model=climateattention, tokenizer=tokenizer)

ClimateAttention('Emissions have increased during the last several months')

>> [{'label': 'Yes', 'score': 0.9993829727172852}]

```
 # Performance:

Performance tested on the balanced ClimaText 10K test set, featuring 300 samples (67 positives, 233 negatives)  (Varini et al., 2020)

|Accuracy|  Precision | Recall  |   F1   |  
|----|-----|-----|-----|
| 0.9633   |  1    |  0.8358 | 0.9106 |  



# References:

Varini, F. S., Boyd-Graber, J., Ciaramita, M., & Leippold, M. (2020). 
ClimaText: A dataset for climate change topic detection. arXiv preprint arXiv:2012.00483.

Webersinke, N., Kraus, M., Bingler, J. A., & Leippold, M. (2021). 
Climatebert: A pretrained language model for climate-related text. arXiv preprint arXiv:2110.12010.

------------------------------
https://kruthof.github.io