|
--- |
|
license: cc-by-nc-sa-4.0 |
|
language: |
|
- en |
|
metrics: |
|
- f1 |
|
- accuracy |
|
widget: |
|
- text: Girls like attention and they get desperate |
|
tags: |
|
- sexism |
|
datasets: |
|
- tum-nlp/sexism-socialmedia-balanced |
|
--- |
|
|
|
# BERTweet for sexism detection |
|
|
|
This is a fine-tuned BERTweet large ([BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/)) model for detecting sexism. |
|
The training dataset is **new balanced** version of Explainable Detection of Online Sexism ([**EDOS**](https://github.com/rewire-online/edos))--[sexism-socialmedia-balanced](https://huggingface.co/datasets/tum-nlp/sexism-socialmedia-balanced)--consisting of 16000 entries in |
|
English gathered from social media platforms: Twitter and Gab. It achieved a **Macro-F1** score of **0.85** and an **Accuracy** of **0.88** on the test set for the EDOS task. |
|
|
|
## How to use |
|
|
|
```python |
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline |
|
|
|
# Load model and tokenizer |
|
tokenizer = AutoTokenizer.from_pretrained('tum-nlp/bertweet-sexism') |
|
model = AutoModelForSequenceClassification.from_pretrained('tum-nlp/bertweet-sexism') |
|
|
|
# Create the pipeline for classification |
|
sexism_classifier = pipeline("text-classification", model=model, tokenizer=tokenizer) |
|
|
|
# Predict |
|
sexism_classifier("Girls like attention and they get desperate") |
|
``` |
|
|
|
## Licensing Information |
|
|
|
[Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa]. |
|
|
|
[![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa] |
|
|
|
[cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/ |
|
[cc-by-nc-sa-image]: https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png |