Quantization made by Richard Erkhov. [Github](https://github.com/RichardErkhov) [Discord](https://discord.gg/pvy7H8DZMG) [Request more models](https://github.com/RichardErkhov/quant_request) llama2_tifa_question_generation - GGUF - Model creator: https://huggingface.co/tifa-benchmark/ - Original model: https://huggingface.co/tifa-benchmark/llama2_tifa_question_generation/ | Name | Quant method | Size | | ---- | ---- | ---- | | [llama2_tifa_question_generation.Q2_K.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q2_K.gguf) | Q2_K | 2.36GB | | [llama2_tifa_question_generation.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.IQ3_XS.gguf) | IQ3_XS | 2.6GB | | [llama2_tifa_question_generation.IQ3_S.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.IQ3_S.gguf) | IQ3_S | 2.75GB | | [llama2_tifa_question_generation.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q3_K_S.gguf) | Q3_K_S | 2.75GB | | [llama2_tifa_question_generation.IQ3_M.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.IQ3_M.gguf) | IQ3_M | 2.9GB | | [llama2_tifa_question_generation.Q3_K.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q3_K.gguf) | Q3_K | 3.07GB | | [llama2_tifa_question_generation.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q3_K_M.gguf) | Q3_K_M | 3.07GB | | [llama2_tifa_question_generation.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q3_K_L.gguf) | Q3_K_L | 3.35GB | | [llama2_tifa_question_generation.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.IQ4_XS.gguf) | IQ4_XS | 3.4GB | | [llama2_tifa_question_generation.Q4_0.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q4_0.gguf) | Q4_0 | 3.56GB | | [llama2_tifa_question_generation.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.IQ4_NL.gguf) | IQ4_NL | 3.58GB | | [llama2_tifa_question_generation.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q4_K_S.gguf) | Q4_K_S | 3.59GB | | [llama2_tifa_question_generation.Q4_K.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q4_K.gguf) | Q4_K | 3.8GB | | [llama2_tifa_question_generation.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q4_K_M.gguf) | Q4_K_M | 3.8GB | | [llama2_tifa_question_generation.Q4_1.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q4_1.gguf) | Q4_1 | 3.95GB | | [llama2_tifa_question_generation.Q5_0.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q5_0.gguf) | Q5_0 | 4.33GB | | [llama2_tifa_question_generation.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q5_K_S.gguf) | Q5_K_S | 4.33GB | | [llama2_tifa_question_generation.Q5_K.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q5_K.gguf) | Q5_K | 4.45GB | | [llama2_tifa_question_generation.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q5_K_M.gguf) | Q5_K_M | 4.45GB | | [llama2_tifa_question_generation.Q5_1.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q5_1.gguf) | Q5_1 | 4.72GB | | [llama2_tifa_question_generation.Q6_K.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q6_K.gguf) | Q6_K | 5.15GB | | [llama2_tifa_question_generation.Q8_0.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q8_0.gguf) | Q8_0 | 6.67GB | Original model description: --- license: apache-2.0 inference: true widget: - text: "[INST] <>\nGiven an image description, generate one or two multiple-choice questions that verifies if the image description is correct.\nClassify each concept into a type (object, human, animal, food, activity, attribute, counting, color, material, spatial, location, shape, other), and then generate a question for each type.\n\n<>\n\nDescription: a blue rabbit and a red plane [/INST] Entities:" pipeline_tag: text-generation tags: - text-generation-inference - llama2 - text-to-image datasets: - TIFA language: - en --- Project page: This is the text parsing and question generation model for the ICCV 2023 paper [TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering](https://arxiv.org/abs/2303.11897) We introduce TIFA (Text-to-Image Faithfulness evaluation with question Answering), an automatic evaluation metric that measures the faithfulness of a generated image to its text input via visual question answering (VQA). Specifically, given a text input, we automatically generate several question-answer pairs using a language model. We calculate image faithfulness by checking whether existing VQA models can answer these questions using the generated image. Specifically, this fine-tuned LLaMA 2 model is the substitute for the GPT-3 model in the paper. It can parse an arbitrary prompt into visual entities, attributes, relations, etc. and generate question-answer tuples for each of them. See examples below. # QuickStart All codes are from . Clone this repo to easily use this model together with other modules (e.g. VQA) provided in TIFA. Please follow the prompt format, which will give the best performance. ```python import torch import transformers # prepare the LLaMA 2 model model_name = "tifa-benchmark/llama2_tifa_question_generation" pipeline = transformers.pipeline( "text-generation", model=model_name, torch_dtype=torch.float16, device_map="auto", ) # formating prompt following LLaMA 2 style def create_qg_prompt(caption): INTRO_BLURB = "Given an image description, generate one or two multiple-choice questions that verifies if the image description is correct.\nClassify each concept into a type (object, human, animal, food, activity, attribute, counting, color, material, spatial, location, shape, other), and then generate a question for each type.\n" formated_prompt = f"[INST] <>\n{INTRO_BLURB}\n<>\n\n" formated_prompt += f"Description: {caption} [/INST] Entities:" return formated_prompt test_caption = "a blue rabbit and a red plane" # create prompt prompt = create_qg_prompt(text_caption) # text completion sequences = pipeline( prompt, do_sample=False, num_beams=5, num_return_sequences=1, max_length=512) output = sequences[0]['generated_text'][len(prompt):] output = output.split('\n\n')[0] # output print(output) #### Expected output ### # rabbit, plane # Activites: # Colors: blue, red # Counting: # Other attributes: # About rabbit (animal): # Q: is this a rabbit? # Choices: yes, no # A: yes # About rabbit (animal): # Q: what animal is in the picture? # Choices: rabbit, dog, cat, fish # A: rabbit # About plane (object): # Q: is this a plane? # Choices: yes, no # A: yes # About plane (object): # Q: what type of vehicle is this? # Choices: plane, car, motorcycle, bus # A: plane # About blue (color): # Q: is the rabbit blue? # Choices: yes, no # A: yes # About blue (color): # Q: what color is the rabbit? # Choices: blue, red, yellow, green # A: blue # About red (color): # Q: is the plane red? # Choices: yes, no # A: yes # About red (color): # Q: what color is the plane? # Choices: red, blue, yellow, green # A: red ``` # Use this LM under tifascore package tifascore provides extra functions to parse this output etc. First install tifascore according to . Then the usage is below ```python from tifascore import get_llama2_pipeline, get_llama2_question_and_answers pipeline = get_llama2_pipeline("tifa-benchmark/llama2_tifa_question_generation") print(get_llama2_question_and_answers(pipeline, "a blue rabbit and a red plane")) #### Expected output ### # [{'caption': 'a blue rabbit and a red plane', 'element': 'rabbit', 'question': 'what animal is in the picture?', 'choices': ['rabbit', 'dog', 'cat', 'fish'], 'answer': 'rabbit', 'element_type': 'animal/human'}, {'caption': 'a blue rabbit and a red plane', 'element': 'plane', 'question': 'is this a plane?', 'choices': ['yes', 'no'], 'answer': 'yes', 'element_type': 'object'}, {'caption': 'a blue rabbit and a red plane', 'element': 'plane', 'question': 'what type of vehicle is this?', 'choices': ['plane', 'car', 'motorcycle', 'bus'], 'answer': 'plane', 'element_type': 'object'}, {'caption': 'a blue rabbit and a red plane', 'element': 'blue', 'question': 'is the rabbit blue?', 'choices': ['yes', 'no'], 'answer': 'yes', 'element_type': 'color'}, {'caption': 'a blue rabbit and a red plane', 'element': 'blue', 'question': 'what color is the rabbit?', 'choices': ['blue', 'red', 'yellow', 'green'], 'answer': 'blue', 'element_type': 'color'}, {'caption': 'a blue rabbit and a red plane', 'element': 'red', 'question': 'is the plane red?', 'choices': ['yes', 'no'], 'answer': 'yes', 'element_type': 'color'}, {'caption': 'a blue rabbit and a red plane', 'element': 'red', 'question': 'what color is the plane?', 'choices': ['red', 'blue', 'yellow', 'green'], 'answer': 'red', 'element_type': 'color'}] ``` ## Bibtex ``` @article{hu2023tifa, title={Tifa: Accurate and interpretable text-to-image faithfulness evaluation with question answering}, author={Hu, Yushi and Liu, Benlin and Kasai, Jungo and Wang, Yizhong and Ostendorf, Mari and Krishna, Ranjay and Smith, Noah A}, journal={arXiv preprint arXiv:2303.11897}, year={2023} } ```