metadata

base_model: unsloth/llama-3.2-11b-vision-instruct-unsloth-bnb-4bit
tags:
  - text-generation-inference
  - transformers
  - unsloth
  - mllama
  - trl
license: apache-2.0
language:
  - en

Product Captioining Model

Given a product-image, this model can create accurate description about the image, describing the following criterions:

Surface where object is located
Surrounding objects
Background
Lighting
Overall mood

The model was trained with a custom dataset tailored for this usecase.

Examples

Generated prompt: Professional photo of an object on a stone podium which is on a marble table, a wall in the background, a palm leaf in the corner, a harsh shadow from the left side, a concrete wall in the background, minimalist mood

Generated prompt: Professional photo of an object on a wooden table, bokeh background, soft daylight

Generated prompt: Professional photo of an object on a marble podium which is on a jungle clearing, surrounded by palm trees and lush greenery, a misty mountain range in the background, a cloudy sky

Uploaded model

Developed by: Vimax97
License: apache-2.0
Finetuned from model : unsloth/llama-3.2-11b-vision-instruct-unsloth-bnb-4bit

This mllama model was trained 2x faster with Unsloth and Huggingface's TRL library.