Bijon Guha commited on
Commit
97933dd
·
1 Parent(s): 1e9b4a1

file upload

Browse files
README.md CHANGED
@@ -1,7 +1,7 @@
1
  ---
2
- title: Yolov3 Voc Era1
3
- emoji: 🏃
4
- colorFrom: purple
5
  colorTo: yellow
6
  sdk: gradio
7
  sdk_version: 3.40.1
@@ -9,5 +9,59 @@ app_file: app.py
9
  pinned: false
10
  license: mit
11
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
13
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
1
  ---
2
+ title: YoloV3 GradCam
3
+ emoji: 🦀
4
+ colorFrom: pink
5
  colorTo: yellow
6
  sdk: gradio
7
  sdk_version: 3.40.1
 
9
  pinned: false
10
  license: mit
11
  ---
12
+ # YOLO V2 & V3 and Object Detection Techniques
13
+
14
+ ## How to Use the App
15
+
16
+ <br>
17
+
18
+ 1. The app has two tabs:
19
+
20
+ - **YoloV3 Object Detection** : In this tab, you can upload your own image of dimensions 416 x 416 pixels or choose an example image provided already to classify and visualize the Class Activation Maps using GradCAM.
21
+ You can adjust the number of top predicted classes, select multiple target layers from the model trained, control the transparency of the overlay, and allow/hide GradCam visualizations.
22
+
23
+ - **GradCam Visualization** : In this tab, we are able visualize a gallery of misclassified images from PASCAL_VOC test dataset. You can control the transparency of the overlay, select a target layer, control the number of misclassified examples shown and allow/hide the GradCAM overlay.
24
+
25
+ <br>
26
+
27
+ 2. **YoloV3 Object Detection**
28
+ - **Input Image** : Upload your own image of dimensions 416 x 416 pixels or select one of the example images given below.
29
+ - **Enable GradCAM** : Allows the GradCAM overlay on the input image. Unchecking it allows to view the original image.
30
+ - **Network Layers** : Select the target layers for GradCAM visualization. The values range from [-4,-1] and the default values are -2 and -1.
31
+ - **Transparency** : Control the transparency of the GradCAM overlay. The default value is 0.5.
32
+ - **Threshold** : Control the threshold for the boxes plotted on the images.
33
+
34
+ <br>
35
+
36
+ 3. **GradCam Visualization**
37
+ - **Input Image** : Upload your own image of dimensions 416 x 416 pixels or select one of the example images given below.
38
+ - **Network Layer** : Adjust the target layer for GradCAM visualization in the model's layers. The default value is -2.
39
+ - **Transparency** : Control the transparency of the GradCAM overlay. The default value is 0.5.
40
+ - **Enable GradCAM** : Allows to display the GradCAM overlay on the misclassified images. Unchecking it allows to view the original images.
41
+ - **Threshold** : Control the threshold for the boxes plotted on the images.
42
+
43
+
44
+ <br>
45
+
46
+ 4. After adjusting the parameters, click the `Submit` button to see the results.
47
+
48
+ <br>
49
+
50
+ 5. To reset the parameters back to default, click on `Clear` button.
51
+
52
+ ## Training code
53
+
54
+ The Pytorch-Lightning code used to train, validate the model can be viewed at - https://github.com/TharunSivamani/ERA-V1/blob/main/Session%2013/S13.ipynb
55
+
56
+ ## License
57
+
58
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
59
+
60
+ ## Credits
61
+
62
+ - This app is built using the Gradio library ([https://www.gradio.app/](https://www.gradio.app/)) for interactive model interfaces.
63
+ - The PASCAL VOC dataset ([https://www.kaggle.com/datasets/aladdinpersson/pascal-voc-dataset-used-in-yolov3-video](https://www.kaggle.com/datasets/aladdinpersson/pascal-voc-dataset-used-in-yolov3-video)) is used for training and evaluation.
64
+ - The PyTorch library ([https://pytorch.org/](https://pytorch.org/)) is used for the deep learning model and GradCAM visualization.
65
+ - Pytorch Lightning Framework ([https://lightning.ai/](https://lightning.ai/)) is used in training and other steps
66
 
67
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
app.py CHANGED
@@ -1,7 +1,80 @@
1
  import gradio as gr
 
 
 
 
 
2
 
3
- def greet(name):
4
- return "Hello " + name + "!!"
 
5
 
6
- iface = gr.Interface(fn=greet, inputs="text", outputs="text")
7
- iface.launch()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  import gradio as gr
2
+ import numpy as np
3
+ import config
4
+ from utils import *
5
+ from pytorch_grad_cam.utils.image import show_cam_on_image
6
+ from yolov3 import YOLOv3LightningModel
7
 
8
+ ex1 = [[f'examples/{i}.jpg'] for i in range(1,8)]
9
+ ex2 = [[f'examples/{i}.jpg'] for i in range(8,15)]
10
+ scaled_anchors = config.scaled_anchors
11
 
12
+ model = YOLOv3LightningModel()
13
+ model.load_state_dict(torch.load("yolov3.pth", map_location="cpu"), strict=False)
14
+ model.eval()
15
+
16
+ @torch.inference_mode()
17
+ def YoloV3_classifier(image, thresh=0.5,iou_thresh=0.5):
18
+ transformed_image = config.transforms(image=image)["image"].unsqueeze(0)
19
+ output = model(transformed_image)
20
+
21
+ bboxes = [[] for _ in range(1)]
22
+ for i in range(3):
23
+ batch_size, A, S, _, _ = output[i].shape
24
+ anchor = scaled_anchors[i]
25
+ boxes_scale_i = cells_to_bboxes(
26
+ output[i], anchor, S=S, is_preds=True
27
+ )
28
+ for idx, (box) in enumerate(boxes_scale_i):
29
+ bboxes[idx] += box
30
+
31
+ nms_boxes = non_max_suppression(
32
+ bboxes[0], iou_threshold=iou_thresh, threshold=thresh, box_format="midpoint",
33
+ )
34
+ plot_img = draw_bounding_boxes(image.copy(), nms_boxes, class_labels=config.PASCAL_CLASSES)
35
+
36
+ return plot_img
37
+
38
+ window1 = gr.Interface(
39
+ YoloV3_classifier,
40
+ inputs=[
41
+ gr.Image(label="Input Image"),
42
+ gr.Slider(0, 1, value=0.5, step=0.1, label="Threshold", info="Set Threshold value"),
43
+ gr.Slider(0, 1, value=0.5, step=0.1, label="IOU Threshold", info="Set IOU Threshold value"),
44
+ ],
45
+ outputs=[
46
+ gr.Image(label="YoloV3 Object Detection"),
47
+ ],
48
+ examples=ex1,
49
+ )
50
+
51
+
52
+ def visualize_gradCam(image, target_layer=-5, show_cam=True, transparency=0.5):
53
+ if show_cam:
54
+ cam = YoloCAM(model=model, target_layers=[model.layers[target_layer]], use_cuda=False)
55
+ transformed_image = config.transforms(image=image)["image"].unsqueeze(0)
56
+ grayscale_cam = cam(transformed_image, scaled_anchors)[0, :, :]
57
+ img = cv2.resize(image, (416, 416))
58
+ img = np.float32(img) / 255
59
+ cam_image = show_cam_on_image(img, grayscale_cam, use_rgb=True, image_weight=transparency)
60
+ else:
61
+ cam_image = image
62
+
63
+ return cam_image
64
+
65
+ window2 = gr.Interface(
66
+ visualize_gradCam,
67
+ inputs=[
68
+ gr.Image(label="Input Image"),
69
+ gr.Slider(-5, -2, value=-3, step=-1, label="Network Layer", info="GRAD-CAM Layer to visualize?"),
70
+ gr.Checkbox(label="GradCAM", value=True, info="Visualize Class Activation Maps ??"),
71
+ gr.Slider(0, 1, value=0.5, step=0.1, label="Transparency", info="Set Transparency of GRAD-CAMs"),
72
+ ],
73
+ outputs=[
74
+ gr.Image(label="Grad-CAM Visualization"),
75
+ ],
76
+ examples=ex2,
77
+ )
78
+
79
+ app = gr.TabbedInterface([window1, window2], ["YOLO V3 Detection", "GradCAM Visualization"])
80
+ app.launch()
config.py ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import albumentations as A
2
+ import cv2
3
+ import torch
4
+
5
+ from albumentations.pytorch import ToTensorV2
6
+
7
+
8
+ DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
9
+
10
+
11
+ IMAGE_SIZE = 416
12
+ transforms = A.Compose(
13
+ [
14
+ A.LongestMaxSize(max_size=IMAGE_SIZE),
15
+ A.PadIfNeeded(
16
+ min_height=IMAGE_SIZE, min_width=IMAGE_SIZE, border_mode=cv2.BORDER_CONSTANT
17
+ ),
18
+ A.Normalize(mean=[0, 0, 0], std=[1, 1, 1], max_pixel_value=255,),
19
+ ToTensorV2(),
20
+ ],
21
+ )
22
+
23
+
24
+ ANCHORS = [
25
+ [(0.28, 0.22), (0.38, 0.48), (0.9, 0.78)],
26
+ [(0.07, 0.15), (0.15, 0.11), (0.14, 0.29)],
27
+ [(0.02, 0.03), (0.04, 0.07), (0.08, 0.06)],
28
+ ] # Note these have been rescaled to be between [0, 1]
29
+
30
+ S = [IMAGE_SIZE // 32, IMAGE_SIZE // 16, IMAGE_SIZE // 8]
31
+
32
+ scaled_anchors = (
33
+ torch.tensor(ANCHORS)
34
+ * torch.tensor(S).unsqueeze(1).unsqueeze(1).repeat(1, 3, 2)
35
+ ).to(DEVICE)
36
+
37
+ PASCAL_CLASSES = [
38
+ "aeroplane",
39
+ "bicycle",
40
+ "bird",
41
+ "boat",
42
+ "bottle",
43
+ "bus",
44
+ "car",
45
+ "cat",
46
+ "chair",
47
+ "cow",
48
+ "diningtable",
49
+ "dog",
50
+ "horse",
51
+ "motorbike",
52
+ "person",
53
+ "pottedplant",
54
+ "sheep",
55
+ "sofa",
56
+ "train",
57
+ "tvmonitor"
58
+ ]
examples/1.jpg ADDED
examples/10.jpg ADDED
examples/11.jpg ADDED
examples/12.jpg ADDED
examples/13.jpg ADDED
examples/14.jpg ADDED
examples/15.jpg ADDED
examples/2.jpg ADDED
examples/3.jpg ADDED
examples/4.jpg ADDED
examples/5.jpg ADDED
examples/6.jpg ADDED
examples/7.jpg ADDED
examples/8.jpg ADDED
examples/9.jpg ADDED
requirements.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ torch
2
+ torchvision
3
+ torch_lr_finder
4
+ gradio
5
+ grad-cam
6
+ pillow
7
+ opencv-python
8
+ albumentations
9
+ pytorch-lightning
utils.py ADDED
@@ -0,0 +1,265 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import List
2
+ import torch
3
+ import numpy as np
4
+ import cv2
5
+ import random
6
+
7
+ from pytorch_grad_cam.base_cam import BaseCAM
8
+ from pytorch_grad_cam.utils.svd_on_activations import get_2d_projection
9
+ from pytorch_grad_cam.utils.model_targets import ClassifierOutputTarget
10
+
11
+
12
+ def cells_to_bboxes(predictions, anchors, S, is_preds=True):
13
+ """
14
+ Scales the predictions coming from the model to
15
+ be relative to the entire image such that they for example later
16
+ can be plotted or.
17
+ INPUT:
18
+ predictions: tensor of size (N, 3, S, S, num_classes+5)
19
+ anchors: the anchors used for the predictions
20
+ S: the number of cells the image is divided in on the width (and height)
21
+ is_preds: whether the input is predictions or the true bounding boxes
22
+ OUTPUT:
23
+ converted_bboxes: the converted boxes of sizes (N, num_anchors, S, S, 1+5) with class index,
24
+ object score, bounding box coordinates
25
+ """
26
+ BATCH_SIZE = predictions.shape[0]
27
+ num_anchors = len(anchors)
28
+ box_predictions = predictions[..., 1:5]
29
+ if is_preds:
30
+ anchors = anchors.reshape(1, len(anchors), 1, 1, 2)
31
+ box_predictions[..., 0:2] = torch.sigmoid(box_predictions[..., 0:2])
32
+ box_predictions[..., 2:] = torch.exp(box_predictions[..., 2:]) * anchors
33
+ scores = torch.sigmoid(predictions[..., 0:1])
34
+ best_class = torch.argmax(predictions[..., 5:], dim=-1).unsqueeze(-1)
35
+ else:
36
+ scores = predictions[..., 0:1]
37
+ best_class = predictions[..., 5:6]
38
+
39
+ cell_indices = (
40
+ torch.arange(S)
41
+ .repeat(predictions.shape[0], 3, S, 1)
42
+ .unsqueeze(-1)
43
+ .to(predictions.device)
44
+ )
45
+ x = 1 / S * (box_predictions[..., 0:1] + cell_indices)
46
+ y = 1 / S * (box_predictions[..., 1:2] + cell_indices.permute(0, 1, 3, 2, 4))
47
+ w_h = 1 / S * box_predictions[..., 2:4]
48
+ converted_bboxes = torch.cat((best_class, scores, x, y, w_h), dim=-1).reshape(BATCH_SIZE, num_anchors * S * S, 6)
49
+ return converted_bboxes.tolist()
50
+
51
+
52
+ def intersection_over_union(boxes_preds, boxes_labels, box_format="midpoint"):
53
+ """
54
+ Video explanation of this function:
55
+ https://youtu.be/XXYG5ZWtjj0
56
+
57
+ This function calculates intersection over union (iou) given pred boxes
58
+ and target boxes.
59
+
60
+ Parameters:
61
+ boxes_preds (tensor): Predictions of Bounding Boxes (BATCH_SIZE, 4)
62
+ boxes_labels (tensor): Correct labels of Bounding Boxes (BATCH_SIZE, 4)
63
+ box_format (str): midpoint/corners, if boxes (x,y,w,h) or (x1,y1,x2,y2)
64
+
65
+ Returns:
66
+ tensor: Intersection over union for all examples
67
+ """
68
+
69
+ if box_format == "midpoint":
70
+ box1_x1 = boxes_preds[..., 0:1] - boxes_preds[..., 2:3] / 2
71
+ box1_y1 = boxes_preds[..., 1:2] - boxes_preds[..., 3:4] / 2
72
+ box1_x2 = boxes_preds[..., 0:1] + boxes_preds[..., 2:3] / 2
73
+ box1_y2 = boxes_preds[..., 1:2] + boxes_preds[..., 3:4] / 2
74
+ box2_x1 = boxes_labels[..., 0:1] - boxes_labels[..., 2:3] / 2
75
+ box2_y1 = boxes_labels[..., 1:2] - boxes_labels[..., 3:4] / 2
76
+ box2_x2 = boxes_labels[..., 0:1] + boxes_labels[..., 2:3] / 2
77
+ box2_y2 = boxes_labels[..., 1:2] + boxes_labels[..., 3:4] / 2
78
+
79
+ if box_format == "corners":
80
+ box1_x1 = boxes_preds[..., 0:1]
81
+ box1_y1 = boxes_preds[..., 1:2]
82
+ box1_x2 = boxes_preds[..., 2:3]
83
+ box1_y2 = boxes_preds[..., 3:4]
84
+ box2_x1 = boxes_labels[..., 0:1]
85
+ box2_y1 = boxes_labels[..., 1:2]
86
+ box2_x2 = boxes_labels[..., 2:3]
87
+ box2_y2 = boxes_labels[..., 3:4]
88
+
89
+ x1 = torch.max(box1_x1, box2_x1)
90
+ y1 = torch.max(box1_y1, box2_y1)
91
+ x2 = torch.min(box1_x2, box2_x2)
92
+ y2 = torch.min(box1_y2, box2_y2)
93
+
94
+ intersection = (x2 - x1).clamp(0) * (y2 - y1).clamp(0)
95
+ box1_area = abs((box1_x2 - box1_x1) * (box1_y2 - box1_y1))
96
+ box2_area = abs((box2_x2 - box2_x1) * (box2_y2 - box2_y1))
97
+
98
+ return intersection / (box1_area + box2_area - intersection + 1e-6)
99
+
100
+
101
+ def non_max_suppression(bboxes, iou_threshold, threshold, box_format="corners"):
102
+ """
103
+ Video explanation of this function:
104
+ https://youtu.be/YDkjWEN8jNA
105
+
106
+ Does Non Max Suppression given bboxes
107
+
108
+ Parameters:
109
+ bboxes (list): list of lists containing all bboxes with each bboxes
110
+ specified as [class_pred, prob_score, x1, y1, x2, y2]
111
+ iou_threshold (float): threshold where predicted bboxes is correct
112
+ threshold (float): threshold to remove predicted bboxes (independent of IoU)
113
+ box_format (str): "midpoint" or "corners" used to specify bboxes
114
+
115
+ Returns:
116
+ list: bboxes after performing NMS given a specific IoU threshold
117
+ """
118
+
119
+ assert type(bboxes) == list
120
+
121
+ bboxes = [box for box in bboxes if box[1] > threshold]
122
+ bboxes = sorted(bboxes, key=lambda x: x[1], reverse=True)
123
+ bboxes_after_nms = []
124
+
125
+ while bboxes:
126
+ chosen_box = bboxes.pop(0)
127
+
128
+ bboxes = [
129
+ box
130
+ for box in bboxes
131
+ if box[0] != chosen_box[0]
132
+ or intersection_over_union(
133
+ torch.tensor(chosen_box[2:]),
134
+ torch.tensor(box[2:]),
135
+ box_format=box_format,
136
+ )
137
+ < iou_threshold
138
+ ]
139
+
140
+ bboxes_after_nms.append(chosen_box)
141
+
142
+ return bboxes_after_nms
143
+
144
+
145
+ def draw_bounding_boxes(image, boxes, class_labels):
146
+
147
+ colors = [[random.randint(0, 255) for _ in range(3)] for name in class_labels]
148
+
149
+ im = np.array(image)
150
+ height, width, _ = im.shape
151
+ bbox_thick = int(0.6 * (height + width) / 600)
152
+
153
+ # Create a Rectangle patch
154
+ for box in boxes:
155
+ assert len(box) == 6, "box should contain class pred, confidence, x, y, width, height"
156
+ class_pred = box[0]
157
+ conf = box[1]
158
+ box = box[2:]
159
+ upper_left_x = box[0] - box[2] / 2
160
+ upper_left_y = box[1] - box[3] / 2
161
+
162
+ x1 = int(upper_left_x * width)
163
+ y1 = int(upper_left_y * height)
164
+
165
+ x2 = x1 + int(box[2] * width)
166
+ y2 = y1 + int(box[3] * height)
167
+
168
+ cv2.rectangle(
169
+ image,
170
+ (x1, y1), (x2, y2),
171
+ color=colors[int(class_pred)],
172
+ thickness=bbox_thick
173
+ )
174
+ text = f"{class_labels[int(class_pred)]}: {conf:.2f}"
175
+ t_size = cv2.getTextSize(text, 0, 0.7, thickness=bbox_thick // 2)[0]
176
+ c3 = (x1 + t_size[0], y1 - t_size[1] - 3)
177
+
178
+ cv2.rectangle(image, (x1, y1), c3, colors[int(class_pred)], -1)
179
+ cv2.putText(
180
+ image,
181
+ text,
182
+ (x1, y1 - 2),
183
+ cv2.FONT_HERSHEY_SIMPLEX,
184
+ 0.7,
185
+ (0, 0, 0),
186
+ bbox_thick // 2,
187
+ lineType=cv2.LINE_AA,
188
+ )
189
+
190
+ return image
191
+
192
+
193
+ class YoloCAM(BaseCAM):
194
+ def __init__(self, model, target_layers, use_cuda=False,
195
+ reshape_transform=None):
196
+ super(YoloCAM, self).__init__(model,
197
+ target_layers,
198
+ use_cuda,
199
+ reshape_transform,
200
+ uses_gradients=False)
201
+
202
+ def forward(self,
203
+ input_tensor: torch.Tensor,
204
+ scaled_anchors: torch.Tensor,
205
+ targets: List[torch.nn.Module],
206
+ eigen_smooth: bool = False) -> np.ndarray:
207
+
208
+ if self.cuda:
209
+ input_tensor = input_tensor.cuda()
210
+
211
+ if self.compute_input_gradient:
212
+ input_tensor = torch.autograd.Variable(input_tensor,
213
+ requires_grad=True)
214
+
215
+ outputs = self.activations_and_grads(input_tensor)
216
+ if targets is None:
217
+ bboxes = [[] for _ in range(1)]
218
+ for i in range(3):
219
+ batch_size, A, S, _, _ = outputs[i].shape
220
+ anchor = scaled_anchors[i]
221
+ boxes_scale_i = cells_to_bboxes(
222
+ outputs[i], anchor, S=S, is_preds=True
223
+ )
224
+ for idx, (box) in enumerate(boxes_scale_i):
225
+ bboxes[idx] += box
226
+
227
+ nms_boxes = non_max_suppression(
228
+ bboxes[0], iou_threshold=0.5, threshold=0.4, box_format="midpoint",
229
+ )
230
+ # target_categories = np.argmax(outputs.cpu().data.numpy(), axis=-1)
231
+ target_categories = [box[0] for box in nms_boxes]
232
+ targets = [ClassifierOutputTarget(
233
+ category) for category in target_categories]
234
+
235
+ if self.uses_gradients:
236
+ self.model.zero_grad()
237
+ loss = sum([target(output)
238
+ for target, output in zip(targets, outputs)])
239
+ loss.backward(retain_graph=True)
240
+
241
+ # In most of the saliency attribution papers, the saliency is
242
+ # computed with a single target layer.
243
+ # Commonly it is the last convolutional layer.
244
+ # Here we support passing a list with multiple target layers.
245
+ # It will compute the saliency image for every image,
246
+ # and then aggregate them (with a default mean aggregation).
247
+ # This gives you more flexibility in case you just want to
248
+ # use all conv layers for example, all Batchnorm layers,
249
+ # or something else.
250
+ cam_per_layer = self.compute_cam_per_layer(input_tensor,
251
+ targets,
252
+ eigen_smooth)
253
+ return self.aggregate_multi_layers(cam_per_layer)
254
+
255
+ def get_cam_image(self,
256
+ input_tensor,
257
+ target_layer,
258
+ target_category,
259
+ activations,
260
+ grads,
261
+ eigen_smooth):
262
+ return get_2d_projection(activations)
263
+
264
+
265
+
yolov3.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8816c44660d8b8f77225422081adf109deb727f9a84fe897b7f2726074308252
3
+ size 246877637
yolov3.py ADDED
@@ -0,0 +1,171 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ import torch.nn as nn
3
+ import pytorch_lightning as pl
4
+ import config as cfg
5
+
6
+ """
7
+ Information about architecture config:
8
+ Tuple is structured by (filters, kernel_size, stride)
9
+ Every conv is a same convolution.
10
+ List is structured by "B" indicating a residual block followed by the number of repeats
11
+ "S" is for scale prediction block and computing the yolo loss
12
+ "U" is for upsampling the feature map and concatenating with a previous layer
13
+ """
14
+ config = [
15
+ (32, 3, 1),
16
+ (64, 3, 2),
17
+ ["B", 1],
18
+ (128, 3, 2),
19
+ ["B", 2],
20
+ (256, 3, 2),
21
+ ["B", 8],
22
+ (512, 3, 2),
23
+ ["B", 8],
24
+ (1024, 3, 2),
25
+ ["B", 4], # To this point is Darknet-53
26
+ (512, 1, 1),
27
+ (1024, 3, 1),
28
+ "S",
29
+ (256, 1, 1),
30
+ "U",
31
+ (256, 1, 1),
32
+ (512, 3, 1),
33
+ "S",
34
+ (128, 1, 1),
35
+ "U",
36
+ (128, 1, 1),
37
+ (256, 3, 1),
38
+ "S",
39
+ ]
40
+
41
+
42
+ class CNNBlock(nn.Module):
43
+ def __init__(self, in_channels, out_channels, bn_act=True, **kwargs):
44
+ super().__init__()
45
+ self.conv = nn.Conv2d(in_channels, out_channels, bias=not bn_act, **kwargs)
46
+ self.bn = nn.BatchNorm2d(out_channels)
47
+ self.leaky = nn.LeakyReLU(0.1)
48
+ self.use_bn_act = bn_act
49
+
50
+ def forward(self, x):
51
+ if self.use_bn_act:
52
+ return self.leaky(self.bn(self.conv(x)))
53
+ else:
54
+ return self.conv(x)
55
+
56
+
57
+ class ResidualBlock(nn.Module):
58
+ def __init__(self, channels, use_residual=True, num_repeats=1):
59
+ super().__init__()
60
+ self.layers = nn.ModuleList()
61
+ for repeat in range(num_repeats):
62
+ self.layers += [
63
+ nn.Sequential(
64
+ CNNBlock(channels, channels // 2, kernel_size=1),
65
+ CNNBlock(channels // 2, channels, kernel_size=3, padding=1),
66
+ )
67
+ ]
68
+
69
+ self.use_residual = use_residual
70
+ self.num_repeats = num_repeats
71
+
72
+ def forward(self, x):
73
+ for layer in self.layers:
74
+ if self.use_residual:
75
+ x = x + layer(x)
76
+ else:
77
+ x = layer(x)
78
+
79
+ return x
80
+
81
+
82
+ class ScalePrediction(nn.Module):
83
+ def __init__(self, in_channels, num_classes):
84
+ super().__init__()
85
+ self.pred = nn.Sequential(
86
+ CNNBlock(in_channels, 2 * in_channels, kernel_size=3, padding=1),
87
+ CNNBlock(
88
+ 2 * in_channels, (num_classes + 5) * 3, bn_act=False, kernel_size=1
89
+ ),
90
+ )
91
+ self.num_classes = num_classes
92
+
93
+ def forward(self, x):
94
+ return (
95
+ self.pred(x)
96
+ .reshape(x.shape[0], 3, self.num_classes + 5, x.shape[2], x.shape[3])
97
+ .permute(0, 1, 3, 4, 2)
98
+ )
99
+
100
+
101
+ class YOLOv3LightningModel(pl.LightningModule):
102
+ def __init__(self, in_channels=3, num_classes=20):
103
+ super().__init__()
104
+ self.num_classes = num_classes
105
+ self.in_channels = in_channels
106
+ self.layers = self._create_conv_layers()
107
+
108
+ def forward(self, x):
109
+ outputs = [] # for each scale
110
+ route_connections = []
111
+ for layer in self.layers:
112
+ if isinstance(layer, ScalePrediction):
113
+ outputs.append(layer(x))
114
+ continue
115
+
116
+ x = layer(x)
117
+
118
+ if isinstance(layer, ResidualBlock) and layer.num_repeats == 8:
119
+ route_connections.append(x)
120
+
121
+ elif isinstance(layer, nn.Upsample):
122
+ x = torch.cat([x, route_connections[-1]], dim=1)
123
+ route_connections.pop()
124
+
125
+ return outputs
126
+
127
+ def _create_conv_layers(self):
128
+ layers = nn.ModuleList()
129
+ in_channels = self.in_channels
130
+
131
+ for module in config:
132
+ if isinstance(module, tuple):
133
+ out_channels, kernel_size, stride = module
134
+ layers.append(
135
+ CNNBlock(
136
+ in_channels,
137
+ out_channels,
138
+ kernel_size=kernel_size,
139
+ stride=stride,
140
+ padding=1 if kernel_size == 3 else 0,
141
+ )
142
+ )
143
+ in_channels = out_channels
144
+
145
+ elif isinstance(module, list):
146
+ num_repeats = module[1]
147
+ layers.append(ResidualBlock(in_channels, num_repeats=num_repeats,))
148
+
149
+ elif isinstance(module, str):
150
+ if module == "S":
151
+ layers += [
152
+ ResidualBlock(in_channels, use_residual=False, num_repeats=1),
153
+ CNNBlock(in_channels, in_channels // 2, kernel_size=1),
154
+ ScalePrediction(in_channels // 2, num_classes=self.num_classes),
155
+ ]
156
+ in_channels = in_channels // 2
157
+
158
+ elif module == "U":
159
+ layers.append(nn.Upsample(scale_factor=2),)
160
+ in_channels = in_channels * 3
161
+
162
+ return layers
163
+
164
+
165
+ def sanity_check(model):
166
+ x = torch.randn((2, 3, cfg.IMAGE_SIZE, cfg.IMAGE_SIZE))
167
+ out = model(x)
168
+ assert model(x)[0].shape == (2, 3, cfg.IMAGE_SIZE // 32, cfg.IMAGE_SIZE // 32, cfg.NUM_CLASSES + 5)
169
+ assert model(x)[1].shape == (2, 3, cfg.IMAGE_SIZE // 16, cfg.IMAGE_SIZE // 16, cfg.NUM_CLASSES + 5)
170
+ assert model(x)[2].shape == (2, 3, cfg.IMAGE_SIZE // 8, cfg.IMAGE_SIZE // 8, cfg.NUM_CLASSES + 5)
171
+ print("Success!")