File size: 11,421 Bytes

---
license: cc-by-nc-sa-4.0
datasets:
- QCRI/LlamaLens-English
- QCRI/LlamaLens-Arabic
- QCRI/LlamaLens-Hindi
language:
- ar
- en
- hi
base_model:
- meta-llama/Llama-3.1-8B-Instruct
pipeline_tag: text-generation
tags:
- Social-Media
- Hate-Speech
- Summarization
- offensive-language
- News-Genre
---
# LlamaLens: Specialized Multilingual LLM forAnalyzing News and Social Media Content

## Overview
LlamaLens is a specialized multilingual LLM designed for analyzing news and social media content. It focuses on 19 NLP tasks, leveraging 52 datasets across Arabic, English, and Hindi.

<p align="center">
<picture>
<img width="352" alt="capablities_tasks_datasets" src="./llamalens-avatar.png">
</picture>
</p>

## Dataset  
The model was trained on the [LlamaLens dataset](https://huggingface.co/collections/QCRI/llamalens-672f7e0604a0498c6a2f0fe9).

## To Replicate the Experiments  
The code to replicate the experiments is available on [GitHub](https://github.com/firojalam/LlamaLens).


## Model Inference

To utilize the LlamaLens model for inference, follow these steps:

1. **Install the Required Libraries**:

   Ensure you have the necessary libraries installed. You can do this using pip:

   ```bash
   pip install transformers torch
   ```
2. **Load the Model and Tokenizer:**:
Use the transformers library to load the LlamaLens model and its tokenizer:

```python
from transformers import pipeline

model_name = "QCRI/LlamaLens"
pipe = pipeline("text-generation", model=model_name)
```
3. **Prepare the Input:**:
Tokenize your input text:
```python
input_text = "Your input text here"
system_message = "Your system message text here"
messages = [
    {"role": "system", "content": system_message},
    {"role": "user", "content": input_text},
]


```
4. **Generate the Output:**:
Generate a response using the model:
```python
generated_text = pipe(messages, num_return_sequences=1)
print(generated_text)
```

## Results

Below, we present the performance of **LlamaLens** compared to existing SOTA (if available) and the Llama-Instruct baseline, The “Δ” (Delta) column here is 
calculated as **(LLamalens – SOTA)**.

---

## Arabic

| **Task**               | **Dataset**               | **Metric** | **SOTA** | **Llama-instruct** | **LLamalens** | **Δ** (LLamalens - SOTA) |
|------------------------|---------------------------|-----------:|--------:|--------------------:|--------------:|------------------------------:|
| News Summarization     | xlsum                     | R-2        | 0.137   | 0.034              | 0.075         | -0.062                       |
| News Genre             | ASND                      | Ma-F1      | 0.770   | 0.587              | 0.938         | 0.168                        |
| News Genre             | SANADAkhbarona            | Acc        | 0.940   | 0.784              | 0.922         | -0.018                       |
| News Genre             | SANADAlArabiya            | Acc        | 0.974   | 0.893              | 0.986         | 0.012                        |
| News Genre             | SANADAlkhaleej            | Acc        | 0.986   | 0.865              | 0.967         | -0.019                       |
| News Genre             | UltimateDataset           | Ma-F1      | 0.970   | 0.376              | 0.883         | -0.087                       |
| News Credibility       | NewsCredibility           | Acc        | 0.899   | 0.455              | 0.494         | -0.405                       |
| Emotion                | Emotional-Tone            | W-F1       | 0.658   | 0.358              | 0.748         | 0.090                        |
| Emotion                | NewsHeadline              | Acc        | 1.000   | 0.406              | 0.551         | -0.449                       |
| Sarcasm                | ArSarcasm-v2              | F1_Pos     | 0.584   | 0.477              | 0.307         | -0.277                       |
| Sentiment              | ar_reviews_100k           | F1_Pos     | –       | 0.343              | 0.665         | –                            |
| Sentiment              | ArSAS                     | Acc        | 0.920   | 0.603              | 0.795         | -0.125                       |
| Stance                 | stance                    | Ma-F1      | 0.767   | 0.608              | 0.936         | 0.169                        |
| Stance                 | Mawqif-Arabic-Stance      | Ma-F1      | 0.789   | 0.764              | 0.867         | 0.078                        |
| Att.worthiness         | CT22Attentionworthy       | W-F1       | 0.412   | 0.158              | 0.544         | 0.132                        |
| Checkworthiness        | CT24_T1                   | F1_Pos     | 0.569   | 0.404              | 0.877         | 0.308                        |
| Claim                  | CT22Claim                 | Acc        | 0.703   | 0.581              | 0.778         | 0.075                        |
| Factuality             | Arafacts                  | Mi-F1      | 0.850   | 0.210              | 0.534         | -0.316                       |
| Factuality             | COVID19Factuality         | W-F1       | 0.831   | 0.492              | 0.781         | -0.050                       |
| Propaganda             | ArPro                     | Mi-F1      | 0.767   | 0.597              | 0.762         | -0.005                       |
| Cyberbullying          | ArCyc_CB                  | Acc        | 0.863   | 0.766              | 0.753         | -0.110                       |
| Harmfulness            | CT22Harmful               | F1_Pos     | 0.557   | 0.507              | 0.508         | -0.049                       |
| Hate Speech            | annotated-hatetweets-4    | W-F1       | 0.630   | 0.257              | 0.549         | -0.081                       |
| Hate Speech            | OSACT4SubtaskB            | Mi-F1      | 0.950   | 0.819              | 0.802         | -0.148                       |
| Offensive              | ArCyc_OFF                 | Ma-F1      | 0.878   | 0.489              | 0.652         | -0.226                       |
| Offensive              | OSACT4SubtaskA            | Ma-F1      | 0.905   | 0.782              | 0.899         | -0.006                       |

---

## English

| **Task**             | **Dataset**               | **Metric** | **SOTA** | **Llama-instruct** | **LLamalens** | **Δ** (LLamalens - SOTA) |
|----------------------|---------------------------|-----------:|--------:|--------------------:|--------------:|------------------------------:|
| News Summarization   | xlsum                     | R-2        | 0.152   | 0.074              | 0.141         | -0.011                       |
| News Genre           | CNN_News_Articles         | Acc        | 0.940   | 0.644              | 0.915         | -0.025                       |
| News Genre           | News_Category             | Ma-F1      | 0.769   | 0.970              | 0.505         | -0.264                       |
| News Genre           | SemEval23T3-ST1           | Mi-F1      | 0.815   | 0.687              | 0.241         | -0.574                       |
| Subjectivity         | CT24_T2                   | Ma-F1      | 0.744   | 0.535              | 0.508         | -0.236                       |
| Emotion              | emotion                   | Ma-F1      | 0.790   | 0.353              | 0.878         | 0.088                        |
| Sarcasm              | News-Headlines            | Acc        | 0.897   | 0.668              | 0.956         | 0.059                        |
| Sentiment            | NewsMTSC                  | Ma-F1      | 0.817   | 0.628              | 0.627         | -0.190                       |
| Checkworthiness      | CT24_T1                   | F1_Pos     | 0.753   | 0.404              | 0.877         | 0.124                        |
| Claim                | claim-detection           | Mi-F1      | –       | 0.545              | 0.915         | –                            |
| Factuality           | News_dataset              | Acc        | 0.920   | 0.654              | 0.946         | 0.026                        |
| Factuality           | Politifact                | W-F1       | 0.490   | 0.121              | 0.290         | -0.200                       |
| Propaganda           | QProp                     | Ma-F1      | 0.667   | 0.759              | 0.851         | 0.184                        |
| Cyberbullying        | Cyberbullying             | Acc        | 0.907   | 0.175              | 0.847         | -0.060                       |
| Offensive            | Offensive_Hateful         | Mi-F1      | –       | 0.692              | 0.805         | –                            |
| Offensive            | offensive_language        | Mi-F1      | 0.994   | 0.646              | 0.884         | -0.110                       |
| Offensive & Hate     | hate-offensive-speech     | Acc        | 0.945   | 0.602              | 0.924         | -0.021                       |

---

## Hindi

| **Task**               | **Dataset**            | **Metric** | **SOTA** | **Llama-instruct** | **LLamalens** | **Δ** (LLamalens - SOTA) |
|------------------------|------------------------|-----------:|--------:|--------------------:|--------------:|------------------------------:|
| NLI                    | NLI_dataset           | W-F1       | 0.646   | 0.633              | 0.655         | 0.009                        |
| News Summarization     | xlsum                 | R-2        | 0.136   | 0.078              | 0.117         | -0.019                       |
| Sentiment              | Sentiment Analysis    | Acc        | 0.697   | 0.552              | 0.669         | -0.028                       |
| Factuality             | fake-news             | Mi-F1      | –       | 0.759              | 0.713         | –                            |
| Hate Speech            | hate-speech-detection | Mi-F1      | 0.639   | 0.750              | 0.994         | 0.355                        |
| Hate Speech            | Hindi-Hostility       | W-F1       | 0.841   | 0.469              | 0.720         | -0.121                       |
| Offensive              | Offensive Speech      | Mi-F1      | 0.723   | 0.621              | 0.847         | 0.124                        |
| Cyberbullying          | MC_Hinglish1          | Acc        | 0.609   | 0.233              | 0.587         | -0.022                       |

## Paper  
For an in-depth understanding, refer to our paper: [**LlamaLens: Specialized Multilingual LLM for Analyzing News and Social Media Content**](https://arxiv.org/pdf/2410.15308).




# License
This model is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).


# Citation
Please cite [our paper](https://arxiv.org/pdf/2410.15308) when using this model:

```
   @article{kmainasi2024llamalensspecializedmultilingualllm,
     title={LlamaLens: Specialized Multilingual LLM for Analyzing News and Social Media Content},
     author={Mohamed Bayan Kmainasi and Ali Ezzat Shahroor and Maram Hasanain and Sahinur Rahman Laskar and Naeemul Hassan and Firoj Alam},
     year={2024},
     journal={arXiv preprint arXiv:2410.15308},
     volume={},
     number={},
     pages={},
     url={https://arxiv.org/abs/2410.15308},
     eprint={2410.15308},
     archivePrefix={arXiv},
     primaryClass={cs.CL}
   }
```