๐Ÿง  Toxicity_model_Llama_3.1_8B โ€“ Spanish Toxicity Classifier (Instruction-Tuned)

๐Ÿ“Œ Model Description

This model is an instruction-tuned version of LLaMA 3.1 8B Instruct, specifically trained to classify the toxicity level of Spanish-language user comments on news articles. It distinguishes between three categories:

  • Non-toxic
  • Slightly toxic
  • Toxic

The model follows instruction-based prompts and returns a single classification label in response.


๐Ÿ“‚ Training Data

The model was fine-tuned on the SocialTOX dataset, a collection of Spanish-language comments annotated for varying levels of toxicity. These comments come from news platforms and represent real-world scenarios of online discourse.


๐Ÿ’ฌ Instruction Format

The model expects input in a structured conversational format like the one below:

[
  {
    "role": "system",
    "content": "You are an expert in detecting toxicity in comments, and your goal is to classify comments based on their level of toxicity. The comments were made on news articles. The toxicity categories are:
    Slightly toxic: Comments that express sarcasm, irony, or rhetorical questions, but do not directly attack or degrade others.
    Toxic: Comments that contain derogatory or pejorative language, inappropriate jokes, fearmongering, denial of facts, threats, personal attacks, insults, degradation, or racist or sexist language. Only classify a comment as โ€œtoxicโ€ if it contains clear attack language, direct insults, or demeaning references.
    Non-toxic: Neutral or critical comments that do not include Toxic or Slightly toxic elements. Note that negative or critical comments (those with a serious or discontented tone) are Not toxic or Slightly toxic unless they meet the criteria of the categories above.
    Please write the corresponding category immediately after the word 'answer.' In case of doubt between two labels, choose the one with the lowest or no toxicity level."
  },
  {
    "role": "user",
    "content": "Text: "Narco-Bolivarian Communism"
  },
  {
    "role": "assistant",
    "content": "Toxic"
  }
]

Training hyperparameters

  • epochs: 3
  • learning_rate: 1e-5
  • beta1: 0.9
  • beta2: 0.95
  • weight_decay: 0.1
  • batch_size global: 4
  • micro_batch_size: 1
  • lr_warmup_steps: 100
  • max_seq_length: 512

๐Ÿ“Š Evaluation

The model was evaluated on a held-out test set of 968 manually annotated comments. Below are the confusion matrix and classification metrics:

๐Ÿงฎ Confusion Matrix

Non-toxic Slightly Toxic Toxic
Non-toxic 325 52 62
Slightly Toxic 116 257 8
Toxic 67 12 69

๐Ÿ“ˆ Classification Report

Class Precision Recall F1-score Support
Non-toxic 0.8006 0.6745 0.7322 381
Slightly toxic 0.6398 0.7403 0.6864 439
Toxic 0.4964 0.4662 0.4808 148
Accuracy 0.6725 968
Macro average 0.6456 0.6270 0.6331 968
Weighted average 0.6812 0.6725 0.6730 968

Macro F1-score: 0.6331

Downloads last month
12
Safetensors
Model size
8.03B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for gplsi/Toxicity_model_Llama_3.1_8B

Finetuned
(1751)
this model

Dataset used to train gplsi/Toxicity_model_Llama_3.1_8B