arxiv:2309.03452

Multi-Modality Guidance Network For Missing Modality Inference

Published on Sep 7, 2023

Authors:

Zhuokai Zhao ,

Abstract

Multimodal models have gained significant success in recent years. Standard multimodal approaches often assume unchanged modalities from training stage to inference stage. In practice, however, many scenarios fail to satisfy such assumptions with missing modalities during inference, leading to limitations on where multimodal models can be applied. While existing methods mitigate the problem through reconstructing the missing modalities, it increases unnecessary computational cost, which could be just as critical, especially for large, deployed systems. To solve the problem from both sides, we propose a novel guidance network that promotes knowledge sharing during training, taking advantage of the multimodal representations to train better single-modality models for inference. Real-life experiment in violence detection shows that our proposed framework trains single-<PRE_TAG>modality models</POST_TAG> that significantly outperform its traditionally trained counterparts while maintaining the same inference cost.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2309.03452 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2309.03452 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2309.03452 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.