---
license: cc-by-4.0
datasets:
- jihyoung/M3C
language:
- en
base_model:
- Qwen/Qwen2-VL-2B-Instruct
---
# Enabling Chatbots with Eyes and Ears: An Immersive Multimodal Conversation System for Dynamic Interactions
[\[📜 Paper\]](https://arxiv.org/abs/2506.00421) [\[🖥️ Project Page\]](https://m3c-dataset.github.io/) [\[📖 Dataset\]](https://huggingface.co/datasets/jihyoung/M3C) [\[🤗 Model Weights\]](https://huggingface.co/jihyoung/M3C-dialogue)
Image Generated by DALL·E
## ✅ TODO List
- [ ] Write documentation (README)
- [ ] Release M³C dataset
- [ ] Release dialogue module weight
- [ ] Release retrieval module weight
- [ ] Release training code
- [ ] Release inference code
- [ ] Release model self-chat code
- [ ] Launch Gradio demo for live chat
## 📚 Citation
```bibtex
@article{jang2025enabling,
title={Enabling Chatbots with Eyes and Ears: An Immersive Multimodal Conversation System for Dynamic Interactions},
author={Jang, Jihyoung and Bae, Minwook and Kim, Minji and Hakkani-Tur, Dilek and Kim, Hyounghun},
journal={arXiv preprint arXiv:2506.00421},
year={2025}
}