metadata
license: cc-by-4.0
datasets:
- jihyoung/M3C
language:
- en
base_model:
- Qwen/Qwen2-VL-2B-Instruct
Enabling Chatbots with Eyes and Ears: An Immersive Multimodal Conversation System for Dynamic Interactions
[๐ Paper] [๐ฅ๏ธ Project Page] [๐ Dataset] [๐ค Model Weights]
Image Generated by DALLยทE
โ TODO List
- Write documentation (README)
- Release MยณC dataset
- Release dialogue module weight
- Release retrieval module weight
- Release training code
- Release inference code
- Release model self-chat code
- Launch Gradio demo for live chat
๐ Citation
@article{jang2025enabling,
title={Enabling Chatbots with Eyes and Ears: An Immersive Multimodal Conversation System for Dynamic Interactions},
author={Jang, Jihyoung and Bae, Minwook and Kim, Minji and Hakkani-Tur, Dilek and Kim, Hyounghun},
journal={arXiv preprint arXiv:2506.00421},
year={2025}
}