Papers
arxiv:2411.19466

ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection

Published on Nov 29, 2024
Authors:
,
,
,
,
,
,

Abstract

Multimodal large language models have unlocked new possibilities for various multimodal tasks. However, their potential in image manipulation detection remains unexplored. When directly applied to the IMD task, M-LLMs often produce <PRE_TAG>reasoning texts</POST_TAG> that suffer from hallucinations and overthinking. To address this, in this work, we propose <PRE_TAG>ForgerySleuth</POST_TAG>, which leverages M-LLMs to perform comprehensive clue fusion and generate segmentation outputs indicating specific regions that are tampered with. Moreover, we construct the ForgeryAnalysis dataset through the Chain-of-Clues prompt, which includes analysis and reasoning text to upgrade the image manipulation detection task. A data engine is also introduced to build a larger-scale dataset for the pre-training phase. Our extensive experiments demonstrate the effectiveness of ForgeryAnalysis and show that <PRE_TAG>ForgerySleuth</POST_TAG> significantly outperforms existing methods in generalization, robustness, and explainability.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2411.19466 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2411.19466 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2411.19466 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.