|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
metrics: |
|
- accuracy |
|
library_name: transformers |
|
tags: |
|
- misinformation |
|
- fake news |
|
- vlm |
|
- mllm |
|
- llm |
|
--- |
|
# Model Card |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
SNIFFER is a multimodal large language model specifically engineered for Out-Of-Context misinformation detection and explanation. |
|
It employs two-stage instruction tuning on [InstructBLIP](https://huggingface.co/Salesforce/instructblip-vicuna-13b), including news-domain alignment and task-specific tuning. |
|
|
|
The whole model is composed of three parts: 1) _internal checking_ that analyzes the consistency of the image and text content; 2) _external checking_ that analyzes the relevance between the context of the retrieved image and the provided text, and 3) _composed reasoning_ that combines the two-pronged analysis to arrive at a final judgment and explanation. |
|
|
|
Here the checkpoint is used for the _internal checking_ part. |
|
|
|
|
|
## Model Sources |
|
|
|
<!-- Provide the basic links for the model. --> |
|
- **Paper:** https://arxiv.org/abs/2403.03170 (to be appear in CVPR 2024) |
|
- **Project:** https://pengqi.site/Sniffer/ |
|
- **Repository:** https://github.com/MischaQI/Sniffer |
|
|
|
|
|
## Results |
|
|
|
Dataset: [NewsCLIPpings](https://github.com/g-luo/news_clippings) |
|
|
|
<div align="center"> |
|
</div> |
|
|
|
| Model | All | Fake | Real | |
|
| :-------------------- | :----| :----| :----| |
|
| SAFE | 52.8 | 54.8 | 52.0 | |
|
| EANN | 58.1 | 61.8 | 56.2 | |
|
| VisualBERT | 58.6 | 38.9 | 78.4 | |
|
| CLIP | 66.0 | 64.3 | 67.7| |
|
| DT-Transformer | 77.1 | 78.6 | 75.6 | |
|
| CCN | 84.7 | 84.8 | 84.5 | |
|
| Neu-Sym detector | 68.2 | - | - | |
|
| **SNIFFER (ours)** | **88.4** | **86.9** | **91.8** | |
|
|
|
|
|
## Citation |
|
|
|
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> |
|
``` |
|
@inproceedings{qi2023sniffer, |
|
author = {Qi, Peng and Yan, Zehong and Hsu, Wynne and Lee, Mong Li}, |
|
title = {SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection}, |
|
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, |
|
year = {2024} |
|
} |
|
``` |
|
|