--- license: mit pipeline_tag: token-classification --- # Description: climateattention-10k classifies if a given sequence is related to climate topics. As a fine-tuned classifier based on climatebert/distilroberta-base-climate-f (Webersinke et al., 2021), it is using the following ClimaText dataset (Varini et al., 2020): * AL-10Ks.tsv : 3000 (58 positives, 2942 negatives) The training set is highly unbalanced. You might want to check the upscaling version: 'kruthof/climateattention-10k-upscaled' # How to use: ```python from transformers import AutoTokenizer, pipeline,RobertaForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("climatebert/distilroberta-base-climate-f") climateattention = RobertaForSequenceClassification.from_pretrained('kruthof/climateattention-10k',num_labels=2) ClimateAttention = pipeline("text-classification", model=climateattention, tokenizer=tokenizer) ClimateAttention('Emissions have increased during the last several months') >> [{'label': 'Yes', 'score': 0.9993829727172852}] ``` # Performance: Performance tested on the balanced ClimaText 10K test set, featuring 300 samples (67 positives, 233 negatives) (Varini et al., 2020) |Accuracy| Precision | Recall | F1 | |----|-----|-----|-----| | 0.9633 | 1 | 0.8358 | 0.9106 | # References: Varini, F. S., Boyd-Graber, J., Ciaramita, M., & Leippold, M. (2020). ClimaText: A dataset for climate change topic detection. arXiv preprint arXiv:2012.00483. Webersinke, N., Kraus, M., Bingler, J. A., & Leippold, M. (2021). Climatebert: A pretrained language model for climate-related text. arXiv preprint arXiv:2110.12010. ------------------------------ https://kruthof.github.io