|
--- |
|
license: mit |
|
datasets: |
|
- xnli |
|
language: |
|
- fr |
|
metrics: |
|
- accuracy |
|
pipeline_tag: zero-shot-classification |
|
--- |
|
|
|
# XLM-ROBERTA-BASE-XNLI_FR |
|
|
|
## Model description |
|
This model takes the XLM-Roberta-base model which has been continued to pre-traine on a large corpus of Twitter in multiple languages. |
|
It was developed following a similar strategy as introduced as part of the [Tweet Eval](https://github.com/cardiffnlp/tweeteval) framework. |
|
The model is further finetuned on the french part of the XNLI training dataset. |
|
|
|
## Intended Usage |
|
|
|
This model was developed to do Zero-Shot Text Classification in the realm of Hate Speech Detection. It is focused on the language of {insert language} as it was finetuned on data in saild languages. Since the base model was pre-trained on 100 different languages it has shown some effectiveness in other languages. Please refer to the list of languages in the [XLM Roberta paper](https://arxiv.org/abs/1911.02116) |
|
|
|
### Usage with Zero-Shot Classification pipeline |
|
```python |
|
from transformers import pipeline |
|
classifier = pipeline("zero-shot-classification", |
|
model="morit/french_xlm_xnli") |
|
``` |
|
|
|
After loading the model you can classify sequences in the languages mentioned above. You can specify your sequences and a matching hypothesis to be able to classify your proposed candidate labels. |
|
|
|
```python |
|
sequence_to_classify = "Je pense que Marcon va gagner les elections?" |
|
|
|
|
|
# we can specify candidate labels and hypothesis: |
|
candidate_labels = ["politique", "sport"] |
|
hypothesis_template = "Cet example est {}" |
|
|
|
# classify using the information provided |
|
classifier(sequence_to_classify, candidate_labels, hypothesis_template=hypothesis_template) |
|
|
|
|
|
# Output |
|
#{'sequence': 'Je pense que Marcon va gagner les elections?', |
|
#'labels': ['politique', 'sport'], |
|
#'scores': [0.8195879459381104, 0.18041200935840607]} |
|
|
|
``` |
|
|
|
|
|
## Training |
|
This model was pre-trained on a set of 100 languages and follwed further training on 198M multilingual tweets as described in the original [paper](https://arxiv.org/abs/2104.12250). Further it was trained on the training set of XNLI dataset in french which is a machine translated version of the MNLI dataset. It was trained on 3 epochs and the following specifications |
|
- learning rate: 5e-5 |
|
- batch size: 32 |
|
- max sequence: length 128 |
|
|
|
on one GPU (NVIDIA GeForce RTX 3090) resulting in a training time of 1h 47 mins. |
|
|
|
|
|
## Evaluation |
|
The model was evaluated after each epoch on the eval set of the XNLI Corpus and at the end of training on the Test set of the XNLI corpus. |
|
Using the test set the model reached an accuracy of |
|
``` |
|
predict_accuracy = 77.72 % |
|
``` |
|
|