---
library_name: peft
base_model: meta-llama/Llama-2-7b-hf
language:
- en
pipeline_tag: text-generation
tags:
- hate-speech
- explanation-generation
---

# Model Card for gllama-alarm-implicit-hate

**GLlama Alarm** is a suite of knowledge-Guided versions of Llama 2 instruction fine-tuned for non-binary abusive language detection and explanation generation tasks.


## Model Details

This version has been instruction fine-tuned on Implicit Hate Corpus for multi-class expressiveness detection and explanation generation (i.e., implicit hate speech, explicit hate speech, not hate) as well as on encyclopedic, commonsense and temporal linguistic knowledge.  

### Model Description

- **Developed by:** Chiara Di Bonaventura, Lucia Siciliani, Pierpaolo Basile
- **Funded by:** The Alan Turing Institute, Fondazione FAIR
- **Language:** English
- **Finetuned from model:** meta-llama/Llama-2-7b-hf

### Model Sources 

- **Paper:** https://kclpure.kcl.ac.uk/ws/portalfiles/portal/316198577/2025_COLING_from_detection_to_explanation.pdf 


## Uses

**GLlama Alarm** is intended for research use in English, especially for NLP tasks in the domain of social media, which might contain offensive content. 
Our suite can be used to **detect different levels of offensiveness and expressiveness of abusive language** (e.g. offensive comments, implicit hate speech, which has proven to be hard for many LLMs) and to **generate structured textual explanations** entailing why the text contains abusive language. 

In any case, language models, including ours, can potentially be used for language generation in a harmful way. GLlama Alarm should not be used directly in any application, without a prior assessment of safety and fairness concerns specific to the application.


## Training Details

**GLlama Alarm** builds on top of the foundational model Llama 2 (7B), which is an auto-regressive language model that uses an optimized transformer architecture. 
Llama 2 was trained on a mix of publicly available online data between January 2023 and July 2023. We select the base version of Llama 2, which has 7B parameters. 
We instruction-funed Llama 2 on the following datasets: HateXplain and Implicit Hate Corpus, separately. This version is the one instruction fine-tuned on Implicit Hate Corpus. 
These datasets contain publicly available data designed for hate speech detection, thus ensuring data privacy and protection. 
To instruction fine-tune Llama 2, we created knowledge-guided prompts following our paradigm. The template is shown in Table 9 of the paper. 
We instruction fine-tuned Llama 2 with 17k knowledge-guided prompts for HateXplain and Implicit Hate for 5 epochs, while setting the other parameters as suggested by [Taori et al., 2023](https://github.com/tatsu-lab/stanford_alpaca).


## Citation
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

**BibTeX:**

@inproceedings{dibonaventura2025gllama_alarm,
  
  title={From Detection to Explanation: Effective Learning Strategies for LLMs in Online Abusive Language Research},
  
  author={Di Bonaventura, Chiara and Siciliani, Lucia and Basile, Pierpaolo and Merono-Penuela, Albert and McGillivray, Barbara},
  
  booktitle={Proceedings of the 2025 International Conference on Computational Linguistics (COLING 2025)},
  
  year={2025}
}


**APA:**

Di Bonaventura, C., Siciliani, L., Basile, P., Merono-Penuela, A., & McGillivray, B. 2025. From Detection to Explanation: Effective Learning Strategies for LLMs in Online Abusive Language Research. 
In Proceedings of the 2025 International Conference on Computational Linguistics (COLING 2025).


## Model Card Contact

chiara.di_bonaventura@kcl.ac.uk

### Framework versions

- PEFT 0.10.0