--- language: en license: apache-2.0 tags: - vision - vqa - 16bit - quantized --- # Paligemma-3b-ft-vizwizvqa-224 (16-bit Quantized) This is a 16-bit quantized version of google/paligemma-3b-ft-vizwizvqa-224, fine-tuned for visual question answering on the VizWiz dataset. ## Usage ```python from transformers import AutoProcessor, AutoModelForImageTextToText from PIL import Image processor = AutoProcessor.from_pretrained("akazen/paligemma-3b-ft-vizwizvqa-16bit") model = AutoModelForImageTextToText.from_pretrained( "akazen/paligemma-3b-ft-vizwizvqa-16bit", device_map="auto" ) # Process an image image = Image.open("your_image.jpg").convert("RGB") question = "What's in this image?" prompt = f"\nQuestion: {question}\nAnswer:" inputs = processor(images=image, text=prompt, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=20) answer = processor.decode(outputs[0], skip_special_tokens=True) ```