Llama Joycaption Alpha Two hf Llava FP8 Dynamic
An FP8 compression of the Llama JoyCaption Alpha Two model made by fancyfeast, using llm-compressor and compatible with vllm.
The model has been tested personally, unfortunately without the proper method, but it works well with my usage pattern.
All credit to fancyfeast and on the official model page you can see more details.
How to Get Started with the Model:
The same as the Llama JoyCaption Alpha Two:
You need the compressed-tensors lib to run the code below with FP8.
import torch
from PIL import Image
from transformers import AutoProcessor, LlavaForConditionalGeneration
IMAGE_PATH = "image.jpg"
PROMPT = "Write a long descriptive caption for this image in a formal tone."
MODEL_NAME = "JKCHSTR/llama-joycaption-alpha-two-hf-llava-FP8-Dynamic"
# Load JoyCaption
# bfloat16 is the native dtype of the LLM used in JoyCaption (Llama 3.1)
# device_map=0 loads the model into the first GPU
processor = AutoProcessor.from_pretrained(MODEL_NAME)
llava_model = LlavaForConditionalGeneration.from_pretrained(MODEL_NAME, torch_dtype="bfloat16", device_map=0)
llava_model.eval()
with torch.no_grad():
# Load image
image = Image.open(IMAGE_PATH)
# Build the conversation
convo = [
{
"role": "system",
"content": "You are a helpful image captioner.",
},
{
"role": "user",
"content": PROMPT,
},
]
# Format the conversation
# WARNING: HF's handling of chat's on Llava models is very fragile. This specific combination of processor.apply_chat_template(), and processor() works
# but if using other combinations always inspect the final input_ids to ensure they are correct. Often times you will end up with multiple <bos> tokens
# if not careful, which can make the model perform poorly.
convo_string = processor.apply_chat_template(convo, tokenize = False, add_generation_prompt = True)
assert isinstance(convo_string, str)
# Process the inputs
inputs = processor(text=[convo_string], images=[image], return_tensors="pt").to('cuda')
inputs['pixel_values'] = inputs['pixel_values'].to(torch.bfloat16)
# Generate the captions
generate_ids = llava_model.generate(
**inputs,
max_new_tokens=300,
do_sample=True,
suppress_tokens=None,
use_cache=True,
temperature=0.6,
top_k=None,
top_p=0.9,
)[0]
# Trim off the prompt
generate_ids = generate_ids[inputs['input_ids'].shape[1]:]
# Decode the caption
caption = processor.tokenizer.decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)
caption = caption.strip()
print(caption)
- Downloads last month
- 349
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.
Model tree for JKCHSTR/llama-joycaption-alpha-two-hf-llava-FP8-Dynamic
Base model
google/siglip-so400m-patch14-384