|
--- |
|
license: cc-by-4.0 |
|
datasets: |
|
- jihyoung/M3C |
|
language: |
|
- en |
|
base_model: |
|
- Qwen/Qwen2-VL-2B-Instruct |
|
--- |
|
|
|
# Enabling Chatbots with Eyes and Ears: An Immersive Multimodal Conversation System for Dynamic Interactions |
|
[\[📜 Paper\]](https://arxiv.org/abs/2506.00421) [\[🖥️ Project Page\]](https://m3c-dataset.github.io/) [\[📖 Dataset\]](https://huggingface.co/datasets/jihyoung/M3C) [\[🤗 Model Weights\]](https://huggingface.co/jihyoung/M3C-dialogue) |
|
|
|
<div align="center"> |
|
<img width="500" alt="image" src="https://github.com/user-attachments/assets/76309007-498a-45ed-a4c2-dac43ee39bfc"> |
|
<br> |
|
<sub>Image Generated by DALL·E</sub> |
|
</div> |
|
|
|
## ✅ TODO List |
|
|
|
- [ ] Write documentation (README) |
|
- [ ] Release M³C dataset |
|
- [ ] Release dialogue module weight |
|
- [ ] Release retrieval module weight |
|
- [ ] Release training code |
|
- [ ] Release inference code |
|
- [ ] Release model self-chat code |
|
- [ ] Launch Gradio demo for live chat |
|
|
|
## 📚 Citation |
|
|
|
```bibtex |
|
@article{jang2025enabling, |
|
title={Enabling Chatbots with Eyes and Ears: An Immersive Multimodal Conversation System for Dynamic Interactions}, |
|
author={Jang, Jihyoung and Bae, Minwook and Kim, Minji and Hakkani-Tur, Dilek and Kim, Hyounghun}, |
|
journal={arXiv preprint arXiv:2506.00421}, |
|
year={2025} |
|
} |
|
|