@prithivMLmods on Hugging Face: "Introducing https://huggingface.co/prithivMLmods/DeepCaption-VLA-7B, a…"

prithivMLmods

posted an update 5 days ago

Post

3343

Introducing prithivMLmods/DeepCaption-VLA-7B, a multimodal VLM designed for reasoning with long-shot captions (Captioning and Vision-Language Attribution). It focuses on defining visual properties, object attributes, and scene details across a wide spectrum of images and aspect ratios, generating attribute-rich image captions. The model supports creative, artistic, and technical applications that require detailed descriptions. 🤗🔥

✦︎ Models: prithivMLmods/DeepCaption-VLA-7B, also includes prithivMLmods/DeepAttriCap-VLA-3B, an experimental model for vision-language attribution.

✦︎ Try the demo here: prithivMLmods/VisionScope-R2

✦︎ Try it now on Google Colab, with support for T4 GPUs in 4-bit quant_type: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/DeepCaption-VLA-7B%5B4bit%20-%20notebook%20demo%5D/DeepCaption-VLA-7B.ipynb

✦︎ Collection: prithivMLmods/deepcaption-attr-68b041172ebcb867e45c556a

.
.
.

To know more about it, visit the model card of the respective model. !!

mahbubchula

3 days ago

Can we apply it for traffic Engineering for example to understand and detect the vehicle and road scenario as well as level of traffjc jam

prithivMLmods

3 days ago

•

edited 3 days ago

Yes, you can! @mahbubchula
Try the simple workflow I’ve created below:

↗️notebook demo: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/Behemoth-3B-070225-post0.1_Traffic_Analysis/Behemoth_3B_070225_post0_1_Traffic_Analysis.ipynb

Here, I implemented https://huggingface.co/prithivMLmods/Behemoth-3B-070225-post0.1, which is close in functionality to DeepCaption-VLA-7B. I switched to this model because it better fits the VRAM usage on a T4 Colab instance. You can adapt the model according to your use cases, requirements, and available resources.

For detection functionality, refer to the following app: https://huggingface.co/spaces/sergiopaniego/vlm_object_understanding

Demo UI	Image Inference

Demo UI	Image Inference

~ prithivsakthi ur

Join the conversation