Welcome to XLM-RoBERTArgument!
๐ค Model description
XLM-RoBERTArgument is a model trained by the Hyperloop UPV team (https://hyperloopupv.com/) in order to detect arguments. It is part of the competition of the European Hyperloop Week (https://hyperloopweek.com/). This model was trained on ~50k heterogeneous manually annotated sentences (๐ Stab et al. 2018 and the machine-translation to Spanish) of controversial topics to classify text into one of two labels: ๐ท NON-ARGUMENT (0) and ARGUMENT (1).
๐ Dataset
The dataset (๐ Stab et al. 2018) consists of ARGUMENTS (~11k for each language) that either support or oppose a topic if it includes a relevant reason for supporting or opposing the topic, or as a NON-ARGUMENT (~14k for each language) if it does not include reasons. The authors focus on controversial topics, i.e., topics that include "an obvious polarity to the possible outcomes" and compile a final set of eight controversial topics: abortion, school uniforms, death penalty, marijuana legalization, nuclear energy, cloning, gun control, and minimum wage. The following table shows the number of argument and non-arguments for each language depending on the topic.
TOPIC | ARGUMENT | NON-ARGUMENT |
---|---|---|
abortion | 1,482 | 2,405 |
school uniforms | 1,531 | 1,493 |
death penalty | 1,553 | 2,072 |
marijuana legalization | 1,434 | 1,883 |
nuclear energy | 1,201 | 1,256 |
cloning | 1,116 | 1,342 |
gun control | 1,451 | 2,109 |
minimum wage | 1,264 | 1,712 |
๐๐ผโโ๏ธModel training
XLM-RoBERTArgument was fine-tuned on a XLM-RoBERTA (large) pre-trained model from HuggingFace using the a custom trainer with the following hyperparameters:
num_train_epochs=3
learning_rate=1e-06
per_device_train_batch_size=8
per_device_eval_batch_size=8
๐ Evaluation
The model was evaluated on an evaluation set (20%):
Model | Acc | F1 | R arg | R non | P arg | P non |
---|---|---|---|---|---|---|
XLM-RoBERTArgument | 0.82 | 0.81 | 0.76 | 0.86 | 0.81 | 0.82 |
๐ Usage with PyTorch model
import torch
from torch import nn
import numpy as np
from transformers import AutoTokenizer, XLMRobertaModel
from XLMRobertargument import XLMRoBERTaClassifier # Import classifier
# Load model
model= torch.load("xlmrobertargument.pt")
tokenizer = AutoTokenizer.from_pretrained('xlm-roberta-large')
tok = tokenizer("Hola cรณmo estรกs",padding='max_length', max_length = 200,
truncation=True, return_tensors="pt")
out = model(torch.Tensor(tok["input_ids"]).long(), torch.LongTensor(tok["attention_mask"]).long())
print(out)
# tensor(0.0171, grad_fn=<SqueezeBackward0>)
โ ๏ธ Intended Uses & Potential Limitations
The model can only be a starting point to dive into the exciting field of argument mining. But be aware. An argument is a complex structure, with multiple dependencies. Therefore, the model may perform less well on different topics and text types not included in the training set.
- Downloads last month
- 31