--- license: cc-by-4.0 datasets: - jihyoung/M3C language: - en base_model: - Qwen/Qwen2-VL-2B-Instruct --- # Enabling Chatbots with Eyes and Ears: An Immersive Multimodal Conversation System for Dynamic Interactions [\[📜 Paper\]](https://arxiv.org/abs/2506.00421) [\[🖥️ Project Page\]](https://m3c-dataset.github.io/) [\[📖 Dataset\]](https://huggingface.co/datasets/jihyoung/M3C) [\[🤗 Model Weights\]](https://huggingface.co/jihyoung/M3C-dialogue)
image
Image Generated by DALL·E
## ✅ TODO List - [ ] Write documentation (README) - [ ] Release M³C dataset - [ ] Release dialogue module weight - [ ] Release retrieval module weight - [ ] Release training code - [ ] Release inference code - [ ] Release model self-chat code - [ ] Launch Gradio demo for live chat ## 📚 Citation ```bibtex @article{jang2025enabling, title={Enabling Chatbots with Eyes and Ears: An Immersive Multimodal Conversation System for Dynamic Interactions}, author={Jang, Jihyoung and Bae, Minwook and Kim, Minji and Hakkani-Tur, Dilek and Kim, Hyounghun}, journal={arXiv preprint arXiv:2506.00421}, year={2025} }