AVG-LLaVA Model Card

Model details

Model type: AVG-LLaVA is an open-source LMM that can adaptively select the appropriate visual granularity based on the input image and instruction. It is an auto-regressive language model, based on the transformer architecture. Base LLM: lmsys/vicuna-7b-v1.5

Paper or resources for more information: https://arxiv.org/abs/2410.02745

License

Llama 2 is licensed under the LLAMA 2 Community License, Copyright (c) Meta Platforms, Inc. All Rights Reserved.

Where to send questions or comments about the model: https://github.com/DeepLearnXMU/AVG-LLaVA/issues

Intended use

Primary intended uses: The primary use of LLaVA is research on large multimodal models and chatbots.

Primary intended users: The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.

Training dataset

  • ShareGPT4V Mix665K
  • 200K GPT4V-generated instruction data (ALLaVA)
  • 200K various VQA data

Evaluation dataset

A collection of 11 benchmarks, including general VQA benchmarks, text-oriented VQA benchmarks, and general multimodal benchmarks.

Downloads last month
12
Safetensors
Model size
7.18B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support