Imagroune commited on
Commit
37fe9ed
·
1 Parent(s): 5f6da0d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -0
README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # LLMEyeCap: Giving Eyes to Large Language Models
2
+
3
+ ## Model Description
4
+
5
+ LLMEyeCap is a Novel Object Captioning model designed to extend the capabilities of Large Language Models with vision. It uses a combination of state-of-the-art models and techniques to not only detect objects within images but also generate meaningful captions for them.
6
+
7
+ ### Features
8
+
9
+ - **Novel Object Captioning + Bounding Boxes**
10
+ - **ResNet50 as a backbone**
11
+ - **Customized DETR model for bounding box detection**
12
+ - **BERT Tokenizer and GPT-2 for text generation**
13
+ - **Replacing classification layers with Transformer Decoder Object Captioning layers**
14
+
15
+ ## Training Data
16
+
17
+ The model was trained on the following datasets:
18
+
19
+ - VOC Dataset
20
+ - COCO 80
21
+ - COCO 91
22
+
23
+ Training was carried out for 30 epochs.
24
+
25
+ ## Usage
26
+
27
+ Here's how to use this model for object captioning:
28
+
29
+ \`\`\`python
30
+
31
+ model = LLMEyeCapModel(num_queries=NUM_QUERIES,vocab_size=vocab_size,pad_token=PAD_TOKEN)
32
+ model = model.to(device)
33
+ state_dict = torch.load("LLMEyeCap_01.bin")
34
+ model.load_state_dict(state_dict)
35
+
36
+ def display_image_ds(image_path, bb, ll):
37
+ #print(len(boxes),len(boxes[0]),len(labels),len(labels[0]))
38
+ image = Image.open(image_path).convert('RGB')
39
+
40
+
41
+ fig, ax = plt.subplots(1, 1, figsize=(12, 20)) # Set the figure size
42
+
43
+ ax.imshow(image)
44
+ # Draw bounding boxes and labels
45
+
46
+ for box, label in zip(bb[0], cc[0]):
47
+
48
+ (x, y, w, h) = box
49
+ if (x==0 and y==0 and w==0 and h==0) or label=='na':
50
+ continue
51
+ x*=image.width
52
+ y*=image.height
53
+ w*=image.width
54
+ h*=image.height
55
+ rect = patches.Rectangle((x-w/2, y-h/2), w, h, linewidth=2, edgecolor='r', facecolor='none')
56
+ ax.add_patch(rect)
57
+ label_str = tokenizer.decode(label, skip_special_tokens=True)
58
+ #print("*",label_str,"*")
59
+ if label_str != 'na':
60
+ ax.text(x-w/2, y-h/2, label_str, color='r', bbox=dict(facecolor='white', edgecolor='r', pad=2),fontsize=18)
61
+ image_paths=["../data/coco91/train2017/000000291557.jpg", "../data/coco91/train2017/000000436027.jpg"]
62
+ for im in image_paths:
63
+ bb,cc= model.generate_caption( im, tokenizer, max_length=20,pad_sos=PAD_SOS)
64
+ display_image_ds(im, bb.to('cpu'), cc.to('cpu'))
65
+
66
+ \`\`\`
67
+
68
+ ### Results
69
+
70
+ . See tuto.ipynb file
71
+
72
+ ## Limitations and Future Work
73
+
74
+ This 0.1 version is a stand alone model for captiong objects on images. It can be uses as it or trained on new objects without "catastrophic forgetting".
75
+ Coming the 0.2 version with latent space to connect to hidden dims of LLMs.
76
+
77
+
78
+
79
+ ## Authors
80
+
81
+ Imed MAGROUNE.