MAHBUB HASSAN
mahbubchula
·
AI & ML interests
LLM, AI, ML
Recent Activity
replied to
prithivMLmods's
post
1 day ago
Introducing https://huggingface.co/prithivMLmods/DeepCaption-VLA-7B, a multimodal VLM designed for reasoning with long-shot captions (Captioning and Vision-Language Attribution). It focuses on defining visual properties, object attributes, and scene details across a wide spectrum of images and aspect ratios, generating attribute-rich image captions. The model supports creative, artistic, and technical applications that require detailed descriptions. 🤗🔥
✦︎ Models: https://huggingface.co/prithivMLmods/DeepCaption-VLA-7B, also includes https://huggingface.co/prithivMLmods/DeepAttriCap-VLA-3B, an experimental model for vision-language attribution.
✦︎ Try the demo here: https://huggingface.co/spaces/prithivMLmods/VisionScope-R2
✦︎ Try it now on Google Colab, with support for T4 GPUs in 4-bit quant_type: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/DeepCaption-VLA-7B%5B4bit%20-%20notebook%20demo%5D/DeepCaption-VLA-7B.ipynb
✦︎ Collection: https://huggingface.co/collections/prithivMLmods/deepcaption-attr-68b041172ebcb867e45c556a
.
.
.
To know more about it, visit the model card of the respective model. !!
replied to
prithivMLmods's
post
5 days ago
Introducing https://huggingface.co/prithivMLmods/DeepCaption-VLA-7B, a multimodal VLM designed for reasoning with long-shot captions (Captioning and Vision-Language Attribution). It focuses on defining visual properties, object attributes, and scene details across a wide spectrum of images and aspect ratios, generating attribute-rich image captions. The model supports creative, artistic, and technical applications that require detailed descriptions. 🤗🔥
✦︎ Models: https://huggingface.co/prithivMLmods/DeepCaption-VLA-7B, also includes https://huggingface.co/prithivMLmods/DeepAttriCap-VLA-3B, an experimental model for vision-language attribution.
✦︎ Try the demo here: https://huggingface.co/spaces/prithivMLmods/VisionScope-R2
✦︎ Try it now on Google Colab, with support for T4 GPUs in 4-bit quant_type: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/DeepCaption-VLA-7B%5B4bit%20-%20notebook%20demo%5D/DeepCaption-VLA-7B.ipynb
✦︎ Collection: https://huggingface.co/collections/prithivMLmods/deepcaption-attr-68b041172ebcb867e45c556a
.
.
.
To know more about it, visit the model card of the respective model. !!
reacted
to
prithivMLmods's
post
with 👀
5 days ago
Introducing https://huggingface.co/prithivMLmods/DeepCaption-VLA-7B, a multimodal VLM designed for reasoning with long-shot captions (Captioning and Vision-Language Attribution). It focuses on defining visual properties, object attributes, and scene details across a wide spectrum of images and aspect ratios, generating attribute-rich image captions. The model supports creative, artistic, and technical applications that require detailed descriptions. 🤗🔥
✦︎ Models: https://huggingface.co/prithivMLmods/DeepCaption-VLA-7B, also includes https://huggingface.co/prithivMLmods/DeepAttriCap-VLA-3B, an experimental model for vision-language attribution.
✦︎ Try the demo here: https://huggingface.co/spaces/prithivMLmods/VisionScope-R2
✦︎ Try it now on Google Colab, with support for T4 GPUs in 4-bit quant_type: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/DeepCaption-VLA-7B%5B4bit%20-%20notebook%20demo%5D/DeepCaption-VLA-7B.ipynb
✦︎ Collection: https://huggingface.co/collections/prithivMLmods/deepcaption-attr-68b041172ebcb867e45c556a
.
.
.
To know more about it, visit the model card of the respective model. !!
Organizations
None yet