hugo-albert/xlm-roberta-argument

Welcome to XLM-RoBERTArgument!

🤖 Model description

XLM-RoBERTArgument is a model trained by the Hyperloop UPV team (https://hyperloopupv.com/) in order to detect arguments. It is part of the competition of the European Hyperloop Week (https://hyperloopweek.com/). This model was trained on ~50k heterogeneous manually annotated sentences (📚 Stab et al. 2018 and the machine-translation to Spanish) of controversial topics to classify text into one of two labels: 🏷 NON-ARGUMENT (0) and ARGUMENT (1).

🗃 Dataset

The dataset (📚 Stab et al. 2018) consists of ARGUMENTS (~11k for each language) that either support or oppose a topic if it includes a relevant reason for supporting or opposing the topic, or as a NON-ARGUMENT (~14k for each language) if it does not include reasons. The authors focus on controversial topics, i.e., topics that include "an obvious polarity to the possible outcomes" and compile a final set of eight controversial topics: abortion, school uniforms, death penalty, marijuana legalization, nuclear energy, cloning, gun control, and minimum wage. The following table shows the number of argument and non-arguments for each language depending on the topic.

TOPIC	ARGUMENT	NON-ARGUMENT
abortion	1,482	2,405
school uniforms	1,531	1,493
death penalty	1,553	2,072
marijuana legalization	1,434	1,883
nuclear energy	1,201	1,256
cloning	1,116	1,342
gun control	1,451	2,109
minimum wage	1,264	1,712

🏃🏼‍♂️Model training

XLM-RoBERTArgument was fine-tuned on a XLM-RoBERTA (large) pre-trained model from HuggingFace using the a custom trainer with the following hyperparameters:

num_train_epochs=3
learning_rate=1e-06
per_device_train_batch_size=8
per_device_eval_batch_size=8

📊 Evaluation

The model was evaluated on an evaluation set (20%):

Model	Acc	F1	R arg	R non	P arg	P non
XLM-RoBERTArgument	0.82	0.81	0.76	0.86	0.81	0.82

👀 Usage with PyTorch model

import torch 
from torch import nn
import numpy as np
from transformers import AutoTokenizer, XLMRobertaModel
from XLMRobertargument import XLMRoBERTaClassifier # Import classifier

# Load model
model= torch.load("xlmrobertargument.pt")

tokenizer = AutoTokenizer.from_pretrained('xlm-roberta-large')
tok = tokenizer("Hola cómo estás",padding='max_length', max_length = 200,
                      truncation=True, return_tensors="pt")
out = model(torch.Tensor(tok["input_ids"]).long(), torch.LongTensor(tok["attention_mask"]).long())
print(out)

# tensor(0.0171, grad_fn=<SqueezeBackward0>)

⚠️ Intended Uses & Potential Limitations

The model can only be a starting point to dive into the exciting field of argument mining. But be aware. An argument is a complex structure, with multiple dependencies. Therefore, the model may perform less well on different topics and text types not included in the training set.