File size: 5,962 Bytes
c866ac3
 
 
e8cfd5c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
---
library_name: keras-hub
---
### Model Overview
A Keras model implementing the RetinaNet meta-architecture.

Implements the RetinaNet architecture for object detection. The constructor
requires `num_classes`, `bounding_box_format`, and a backbone. Optionally,
a custom label encoder, and prediction decoder may be provided.


__Arguments__


- __num_classes__: the number of classes in your dataset excluding the
    background class. Classes should be represented by integers in the
    range [0, num_classes).
- __bounding_box_format__: The format of bounding boxes of input dataset.
    Refer
    [to the keras.io docs](https://keras.io/api/keras_cv/bounding_box/formats/)
    for more details on supported bounding box formats.
- __backbone__: `keras.Model`. If the default `feature_pyramid` is used,
    must implement the `pyramid_level_inputs` property with keys "P3", "P4",
    and "P5" and layer names as values. A somewhat sensible backbone
    to use in many cases is the:
    `keras_cv.models.ResNetBackbone.from_preset("resnet50_imagenet")`
- __anchor_generator__: (Optional) a `keras_cv.layers.AnchorGenerator`. If
    provided, the anchor generator will be passed to both the
    `label_encoder` and the `prediction_decoder`. Only to be used when
    both `label_encoder` and `prediction_decoder` are both `None`.
    Defaults to an anchor generator with the parameterization:
    `strides=[2**i for i in range(3, 8)]`,
    `scales=[2**x for x in [0, 1 / 3, 2 / 3]]`,
    `sizes=[32.0, 64.0, 128.0, 256.0, 512.0]`,
    and `aspect_ratios=[0.5, 1.0, 2.0]`.
- __label_encoder__: (Optional) a keras.Layer that accepts an image Tensor, a
    bounding box Tensor and a bounding box class Tensor to its `call()`
    method, and returns RetinaNet training targets. By default, a
    KerasCV standard `RetinaNetLabelEncoder` is created and used.
    Results of this object's `call()` method are passed to the `loss`
    object for `box_loss` and `classification_loss` the `y_true`
    argument.
- __prediction_decoder__: (Optional)  A `keras.layers.Layer` that is
    responsible for transforming RetinaNet predictions into usable
    bounding box Tensors. If not provided, a default is provided. The
    default `prediction_decoder` layer is a
    `keras_cv.layers.MultiClassNonMaxSuppression` layer, which uses
    a Non-Max Suppression for box pruning.
- __feature_pyramid__: (Optional) A `keras.layers.Layer` that produces
    a list of 4D feature maps (batch dimension included)
    when called on the pyramid-level outputs of the `backbone`.
    If not provided, the reference implementation from the paper will be used.
- __classification_head__: (Optional) A `keras.Layer` that performs
    classification of the bounding boxes. If not provided, a simple
    ConvNet with 3 layers will be used.
- __box_head__: (Optional) A `keras.Layer` that performs regression of the
    bounding boxes. If not provided, a simple ConvNet with 3 layers
    will be used.

## Example Usage
## Pretrained RetinaNet model
```
object_detector = keras_hub.models.ImageObjectDetector.from_preset(
    "retinanet_resnet50_fpn_coco"
)

input_data = np.random.uniform(0, 1, size=(2, 224, 224, 3))
object_detector(input_data)
```

## Fine-tune the pre-trained model
```python3
backbone = keras_hub.models.Backbone.from_preset(
    "retinanet_resnet50_fpn_coco"
)
preprocessor = keras_hub.models.RetinaNetObjectDetectorPreprocessor.from_preset(
    "retinanet_resnet50_fpn_coco"
)
model = RetinaNetObjectDetector(
    backbone=backbone,
    num_classes=len(CLASSES),
    preprocessor=preprocessor
)
```

## Custom training the model
```python3
image_converter = keras_hub.layers.RetinaNetImageConverter(
    scale=1/255
)

preprocessor = keras_hub.models.RetinaNetObjectDetectorPreprocessor(
    image_converter=image_converter
)
# Load a pre-trained ResNet50 model. 
# This will serve as the base for extracting image features.
image_encoder = keras_hub.models.Backbone.from_preset(
    "resnet_50_imagenet" 
)

# Build the RetinaNet Feature Pyramid Network (FPN) on top of the ResNet50 
# backbone. The FPN creates multi-scale feature maps for better object detection
# at different sizes.
backbone = keras_hub.models.RetinaNetBackbone(
    image_encoder=image_encoder,
    min_level=3,
    max_level=5,
    use_p5=False 
)
model = RetinaNetObjectDetector(
    backbone=backbone,
    num_classes=len(CLASSES),
    preprocessor=preprocessor
)
```

## Example Usage with Hugging Face URI

## Pretrained RetinaNet model
```
object_detector = keras_hub.models.ImageObjectDetector.from_preset(
    "hf://keras/retinanet_resnet50_fpn_coco"
)

input_data = np.random.uniform(0, 1, size=(2, 224, 224, 3))
object_detector(input_data)
```

## Fine-tune the pre-trained model
```python3
backbone = keras_hub.models.Backbone.from_preset(
    "hf://keras/retinanet_resnet50_fpn_coco"
)
preprocessor = keras_hub.models.RetinaNetObjectDetectorPreprocessor.from_preset(
    "hf://keras/retinanet_resnet50_fpn_coco"
)
model = RetinaNetObjectDetector(
    backbone=backbone,
    num_classes=len(CLASSES),
    preprocessor=preprocessor
)
```

## Custom training the model
```python3
image_converter = keras_hub.layers.RetinaNetImageConverter(
    scale=1/255
)

preprocessor = keras_hub.models.RetinaNetObjectDetectorPreprocessor(
    image_converter=image_converter
)
# Load a pre-trained ResNet50 model. 
# This will serve as the base for extracting image features.
image_encoder = keras_hub.models.Backbone.from_preset(
    "resnet_50_imagenet" 
)

# Build the RetinaNet Feature Pyramid Network (FPN) on top of the ResNet50 
# backbone. The FPN creates multi-scale feature maps for better object detection
# at different sizes.
backbone = keras_hub.models.RetinaNetBackbone(
    image_encoder=image_encoder,
    min_level=3,
    max_level=5,
    use_p5=False 
)
model = RetinaNetObjectDetector(
    backbone=backbone,
    num_classes=len(CLASSES),
    preprocessor=preprocessor
)
```