incl unbalance comment
Browse files
README.md
CHANGED
@@ -1,3 +1,57 @@
|
|
1 |
---
|
2 |
license: mit
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
+
pipeline_tag: token-classification
|
4 |
---
|
5 |
+
|
6 |
+
# Description:
|
7 |
+
|
8 |
+
climateattention-10k classifies if a given sequence is related to climate topics.
|
9 |
+
As a fine-tuned classifier based on climatebert/distilroberta-base-climate-f (Webersinke et al., 2021),
|
10 |
+
it is using the following ClimaText dataset (Varini et al., 2020):
|
11 |
+
|
12 |
+
* AL-10Ks.tsv : 3000 (58 positives, 2942 negatives)
|
13 |
+
|
14 |
+
The training set is highly unbalanced. You might want to check the upscaling version: 'kruthof/climateattention-10k-upscaled'
|
15 |
+
|
16 |
+
# How to use:
|
17 |
+
```python
|
18 |
+
from transformers import AutoTokenizer, pipeline,RobertaForSequenceClassification
|
19 |
+
|
20 |
+
tokenizer = AutoTokenizer.from_pretrained("climatebert/distilroberta-base-climate-f")
|
21 |
+
climateattention = RobertaForSequenceClassification.from_pretrained('kruthof/climateattention-10k',num_labels=2)
|
22 |
+
|
23 |
+
ClimateAttention = pipeline("text-classification", model=climateattention, tokenizer=tokenizer)
|
24 |
+
|
25 |
+
ClimateAttention('Emissions have increased during the last several months')
|
26 |
+
|
27 |
+
>> [{'label': 'Yes', 'score': 0.9993829727172852}]
|
28 |
+
|
29 |
+
```
|
30 |
+
# Performance:
|
31 |
+
|
32 |
+
Performance tested on the balanced ClimaText 10K test set, featuring 300 samples (67 positives, 233 negatives) (Varini et al., 2020)
|
33 |
+
|
34 |
+
|Accuracy| Precision | Recall | F1 |
|
35 |
+
|----|-----|-----|-----|
|
36 |
+
| 0.9633 | 1 | 0.8358 | 0.9106 |
|
37 |
+
|
38 |
+
|
39 |
+
# Cite:
|
40 |
+
```bibtex
|
41 |
+
@misc{kruthof2023,
|
42 |
+
title={ClimateAttention: A Fine-Tuned Climate Attention Classifier},
|
43 |
+
author={Kruthof, Garvin},
|
44 |
+
url={https://huggingface.co/kruthof/climateattention-ctw},
|
45 |
+
year={2023}
|
46 |
+
```
|
47 |
+
|
48 |
+
# References:
|
49 |
+
|
50 |
+
Varini, F. S., Boyd-Graber, J., Ciaramita, M., & Leippold, M. (2020).
|
51 |
+
ClimaText: A dataset for climate change topic detection. arXiv preprint arXiv:2012.00483.
|
52 |
+
|
53 |
+
Webersinke, N., Kraus, M., Bingler, J. A., & Leippold, M. (2021).
|
54 |
+
Climatebert: A pretrained language model for climate-related text. arXiv preprint arXiv:2110.12010.
|
55 |
+
|
56 |
+
------------------------------
|
57 |
+
https://kruthof.github.io
|