flax-community
/

clip-vit-base-patch32_mbart-large-50

Text2Text Generation

clip-vision-mbart

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

bhavitvyamalik commited on Aug 2, 2021

Commit

0e55637

·

1 Parent(s): 6cf649a

update README

Files changed (1) hide show

README.md +15 -15

README.md CHANGED Viewed

@@ -13,22 +13,22 @@ Note that this model is primarily aimed at being fine-tuned on tasks like multi-
 ### How to use❓
 You will need to clone the model from [here](https://github.com/gchhablani/multilingual-image-captioning). An example of usage is shown below:
 ```python
->>> from torchvision.io import read_image
->>> import numpy as  np
->>> import os
->>> from transformers import CLIPProcessor, MBart50TokenizerFast
->>> from model.flax_clip_vision_mbart.modeling_clip_vision_mbart import FlaxCLIPVisionMBartForConditionalGeneration
->>> image_path = os.path.join('images/val2014', os.listdir('images/val2014')[0])
->>> img = read_image(image_path) # reading image
->>> clip_processor = CLIPProcessor.from_pretrained('openai/clip-vit-base-patch32')
->>> clip_outputs = clip_processor(images=img)
->>> clip_outputs['pixel_values'][0] = clip_outputs['pixel_values'][0].transpose(1,2,0) # Need to transpose images as model expected channel last images.
->>> tokenizer = MBart50TokenizerFast.from_pretrained('facebook/mbart-large-50"')
->>> model = FlaxCLIPVisionBertForMaskedLM.from_pretrained('flax-community/clip-vit-base-patch32_mbart-large-50')
->>> output_ids = model.generate(batch["pixel_values"], forced_bos_token_id=tokenizer.lang_code_to_id["en_XX"], num_beams=4, max_length=64).sequences  # "en_XX is the language code in which you want the translation
 # en_XX: English, fr_XX: French, es_XX: Spanish, de_DE: Deutsch
->>> output_string = tokenizer.batch_decode(output_ids.reshape(-1, 64), skip_special_tokens=True, max_length=64)
->>> output_string # relevant caption
 ```
 ## Training data 🏋🏻‍♂️

 ### How to use❓
 You will need to clone the model from [here](https://github.com/gchhablani/multilingual-image-captioning). An example of usage is shown below:
 ```python
+from torchvision.io import read_image
+import numpy as  np
+import os, wget
+from transformers import CLIPProcessor, MBart50TokenizerFast
+from model.flax_clip_vision_mbart.modeling_clip_vision_mbart import FlaxCLIPVisionMBartForConditionalGeneration
+img = wget("http://images.cocodataset.org/val2017/000000397133.jpg")
+img = read_image(img) # reading image
+clip_processor = CLIPProcessor.from_pretrained('openai/clip-vit-base-patch32')
+clip_outputs = clip_processor(images=img)
+clip_outputs['pixel_values'][0] = clip_outputs['pixel_values'][0].transpose(1,2,0) # Need to transpose images as model expected channel last images.
+tokenizer = MBart50TokenizerFast.from_pretrained('facebook/mbart-large-50"')
+model = FlaxCLIPVisionBertForMaskedLM.from_pretrained('flax-community/clip-vit-base-patch32_mbart-large-50')
+output_ids = model.generate(batch["pixel_values"], forced_bos_token_id=tokenizer.lang_code_to_id["es_XX"], num_beams=4, max_length=64).sequences  # "es_XX is the language code in which you want the translation
 # en_XX: English, fr_XX: French, es_XX: Spanish, de_DE: Deutsch
+output_string = tokenizer.batch_decode(output_ids.reshape(-1, 64), skip_special_tokens=True, max_length=64)
+output_string  # Un restaurante u otro lugar para comer en el Hotel
 ```
 ## Training data 🏋🏻‍♂️