File size: 3,097 Bytes
201d6d3 2ed0138 a7aef0d b01d2aa 2ed0138 b01d2aa 201d6d3 2ed0138 ce07ed6 2ed0138 6382be9 2ed0138 506eca2 2ed0138 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
---
license: cc-by-nc-sa-4.0
language:
- en
metrics:
- f1
- accuracy
widget:
- text: Girls like attention and they get desperate
tags:
- sexism
datasets:
- tum-nlp/sexism-socialmedia-balanced
---
# BERTweet for sexism detection
This is a fine-tuned BERTweet large ([BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/)) model for detecting sexism.
The training dataset is **new balanced** version of Explainable Detection of Online Sexism ([**EDOS**](https://github.com/rewire-online/edos))--[sexism-socialmedia-balanced](https://huggingface.co/datasets/tum-nlp/sexism-socialmedia-balanced)--consisting of 16000 entries in
English gathered from social media platforms: Twitter and Gab. It achieved a **Macro-F1** score of **0.85** and an **Accuracy** of **0.88** on the test set for the EDOS task.
## How to use
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained('tum-nlp/bertweet-sexism')
model = AutoModelForSequenceClassification.from_pretrained('tum-nlp/bertweet-sexism')
# Create the pipeline for classification
sexism_classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
# Predict
sexism_classifier("Girls like attention and they get desperate")
```
## Citation
```
@inproceedings{rydelek-etal-2023-adamr,
title = "{A}dam{R} at {S}em{E}val-2023 Task 10: Solving the Class Imbalance Problem in Sexism Detection with Ensemble Learning",
author = "Rydelek, Adam and
Dementieva, Daryna and
Groh, Georg",
editor = {Ojha, Atul Kr. and
Do{\u{g}}ru{\"o}z, A. Seza and
Da San Martino, Giovanni and
Tayyar Madabushi, Harish and
Kumar, Ritesh and
Sartori, Elisa},
booktitle = "Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)",
month = jul,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.semeval-1.190",
doi = "10.18653/v1/2023.semeval-1.190",
pages = "1371--1381",
abstract = "The Explainable Detection of Online Sexism task presents the problem of explainable sexism detection through fine-grained categorisation of sexist cases with three subtasks. Our team experimented with different ways to combat class imbalance throughout the tasks using data augmentation and loss alteration techniques. We tackled the challenge by utilising ensembles of Transformer models trained on different datasets, which are tested to find the balance between performance and interpretability. This solution ranked us in the top 40{\%} of teams for each of the tracks.",
}
```
## Licensing Information
[Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa].
[![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa]
[cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/
[cc-by-nc-sa-image]: https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png |