sujitpal commited on
Commit
33f33f4
·
1 Parent(s): f5a3b7e

fix: added more details

Browse files
Files changed (1) hide show
  1. README.md +52 -1
README.md CHANGED
@@ -1 +1,52 @@
1
- [OpenAI CLIP model](https://openai.com/blog/clip/) fine-tuned using image-caption pairs from the [Caption Prediction dataset](https://www.imageclef.org/2017/caption) provided for the ImageCLEF 2017 competition. The model was evaluated using before and after fine-tuning, the MRR@10 were 0.57 and 0.88 respectively.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ thumbnail:
5
+ tags:
6
+ - multimodal
7
+ - language
8
+ - vision
9
+ - image-search
10
+ license:
11
+ - mit
12
+ metrics:
13
+ - MRR
14
+ ---
15
+
16
+ ### Model Card: clip-imageclef
17
+
18
+ ### Model Details
19
+
20
+ [OpenAI CLIP model](https://openai.com/blog/clip/) fine-tuned using image-caption pairs from the [Caption Prediction dataset](https://www.imageclef.org/2017/caption) provided for the ImageCLEF 2017 competition. The model was evaluated using before and after fine-tuning, MRR@10 were 0.57 and 0.88 respectively.
21
+
22
+ ### Model Date
23
+
24
+ September 6, 2021
25
+
26
+ ### Model Type
27
+
28
+ The base model is the OpenAI CLIP model. It uses a ViT-B/32 Transformer architecture as an image encoder and uses a masked self-attention Transformer as a text encoder. These encoders are trained to maximize the similarity of (image, text) pairs via a contrastive loss.
29
+
30
+ ### Fine-tuning
31
+
32
+ The fine-tuning can be reproduced using code from the Github repository [elsevierlabs-os/clip-image-search]([https://github.com/elsevierlabs-os/clip-image-search#fine-tuning).
33
+
34
+ ### Usage
35
+
36
+ ```
37
+ from transformers import CLIPModel, CLIPProcessor
38
+
39
+ model = CLIPModel.from_pretrained("sujitpal/clip-imageclef")
40
+ processor = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
41
+ inputs = processor(text=captions, images=images,
42
+ return_tensors="pt", padding=True)
43
+ output = model(**inputs)
44
+ ```
45
+
46
+ ### Performance
47
+
48
+ | Model-name | k=1 | k=3 | k=5 | k=10 | k=20 |
49
+ | -------------------------------- | ----- | ----- | ----- | ----- | ----- |
50
+ | zero-shot CLIP (baseline) | 0.426 | 0.534 | 0.558 | 0.573 | 0.578 |
51
+ | clip-imageclef (this model) | 0.802 | 0.872 | 0.877 | 0.879 | 0.880 |
52
+