Model type: SESAME_minus is an open-source multimodal model trained by fine-tuning LLaVA on various instruction-based image grounding (segmentation) data. It is an instruction-baed segmentation model basically, serving as a baseline.
Primary intended uses: The primary use of SESAME is research on large multimodal models and chatbots.
Primary intended users: The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.