Model Card for Model ID
This is a multimodal implementation of Phi2 model inspired by LlaVA-Phi.
Model Details
- LLM Backbone: Phi2
- Vision Tower: clip-vit-large-patch14-336
- Pretraining Dataset: LAION-CC-SBU dataset with BLIP captions(200k samples)
- Finetuning Dataset: Instruct 150k dataset based on COCO
- Finetuned Model: marianna13/llava-phi-2-3b
Model Sources
- Original Repository: Llava-Phi
- Paper [optional]: LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model
- Demo [optional]: Demo Link
- Downloads last month
- 4
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.