kruthof commited on
Commit
6b9dd8b
·
1 Parent(s): ba9ef39

incl unbalance comment

Browse files
Files changed (1) hide show
  1. README.md +54 -0
README.md CHANGED
@@ -1,3 +1,57 @@
1
  ---
2
  license: mit
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ pipeline_tag: token-classification
4
  ---
5
+
6
+ # Description:
7
+
8
+ climateattention-10k classifies if a given sequence is related to climate topics.
9
+ As a fine-tuned classifier based on climatebert/distilroberta-base-climate-f (Webersinke et al., 2021),
10
+ it is using the following ClimaText dataset (Varini et al., 2020):
11
+
12
+ * AL-10Ks.tsv : 3000 (58 positives, 2942 negatives)
13
+
14
+ The training set is highly unbalanced. You might want to check the upscaling version: 'kruthof/climateattention-10k-upscaled'
15
+
16
+ # How to use:
17
+ ```python
18
+ from transformers import AutoTokenizer, pipeline,RobertaForSequenceClassification
19
+
20
+ tokenizer = AutoTokenizer.from_pretrained("climatebert/distilroberta-base-climate-f")
21
+ climateattention = RobertaForSequenceClassification.from_pretrained('kruthof/climateattention-10k',num_labels=2)
22
+
23
+ ClimateAttention = pipeline("text-classification", model=climateattention, tokenizer=tokenizer)
24
+
25
+ ClimateAttention('Emissions have increased during the last several months')
26
+
27
+ >> [{'label': 'Yes', 'score': 0.9993829727172852}]
28
+
29
+ ```
30
+ # Performance:
31
+
32
+ Performance tested on the balanced ClimaText 10K test set, featuring 300 samples (67 positives, 233 negatives) (Varini et al., 2020)
33
+
34
+ |Accuracy| Precision | Recall | F1 |
35
+ |----|-----|-----|-----|
36
+ | 0.9633 | 1 | 0.8358 | 0.9106 |
37
+
38
+
39
+ # Cite:
40
+ ```bibtex
41
+ @misc{kruthof2023,
42
+ title={ClimateAttention: A Fine-Tuned Climate Attention Classifier},
43
+ author={Kruthof, Garvin},
44
+ url={https://huggingface.co/kruthof/climateattention-ctw},
45
+ year={2023}
46
+ ```
47
+
48
+ # References:
49
+
50
+ Varini, F. S., Boyd-Graber, J., Ciaramita, M., & Leippold, M. (2020).
51
+ ClimaText: A dataset for climate change topic detection. arXiv preprint arXiv:2012.00483.
52
+
53
+ Webersinke, N., Kraus, M., Bingler, J. A., & Leippold, M. (2021).
54
+ Climatebert: A pretrained language model for climate-related text. arXiv preprint arXiv:2110.12010.
55
+
56
+ ------------------------------
57
+ https://kruthof.github.io