Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
prithivMLmods 
posted an update 5 days ago
Post
3343
Introducing prithivMLmods/DeepCaption-VLA-7B, a multimodal VLM designed for reasoning with long-shot captions (Captioning and Vision-Language Attribution). It focuses on defining visual properties, object attributes, and scene details across a wide spectrum of images and aspect ratios, generating attribute-rich image captions. The model supports creative, artistic, and technical applications that require detailed descriptions. 🤗🔥

✦︎ Models: prithivMLmods/DeepCaption-VLA-7B, also includes prithivMLmods/DeepAttriCap-VLA-3B, an experimental model for vision-language attribution.

✦︎ Try the demo here: prithivMLmods/VisionScope-R2

✦︎ Try it now on Google Colab, with support for T4 GPUs in 4-bit quant_type: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/DeepCaption-VLA-7B%5B4bit%20-%20notebook%20demo%5D/DeepCaption-VLA-7B.ipynb

✦︎ Collection: prithivMLmods/deepcaption-attr-68b041172ebcb867e45c556a

.
.
.

To know more about it, visit the model card of the respective model. !!

Can we apply it for traffic Engineering for example to understand and detect the vehicle and road scenario as well as level of traffjc jam

·

Yes, you can! @mahbubchula
Try the simple workflow I’ve created below:

↗️notebook demo: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/Behemoth-3B-070225-post0.1_Traffic_Analysis/Behemoth_3B_070225_post0_1_Traffic_Analysis.ipynb

Here, I implemented https://huggingface.co/prithivMLmods/Behemoth-3B-070225-post0.1, which is close in functionality to DeepCaption-VLA-7B. I switched to this model because it better fits the VRAM usage on a T4 Colab instance. You can adapt the model according to your use cases, requirements, and available resources.

For detection functionality, refer to the following app: https://huggingface.co/spaces/sergiopaniego/vlm_object_understanding

Demo UI Image Inference
Demo UI Image Inference

~ prithivsakthi ur