File size: 5,962 Bytes
c866ac3 e8cfd5c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 |
---
library_name: keras-hub
---
### Model Overview
A Keras model implementing the RetinaNet meta-architecture.
Implements the RetinaNet architecture for object detection. The constructor
requires `num_classes`, `bounding_box_format`, and a backbone. Optionally,
a custom label encoder, and prediction decoder may be provided.
__Arguments__
- __num_classes__: the number of classes in your dataset excluding the
background class. Classes should be represented by integers in the
range [0, num_classes).
- __bounding_box_format__: The format of bounding boxes of input dataset.
Refer
[to the keras.io docs](https://keras.io/api/keras_cv/bounding_box/formats/)
for more details on supported bounding box formats.
- __backbone__: `keras.Model`. If the default `feature_pyramid` is used,
must implement the `pyramid_level_inputs` property with keys "P3", "P4",
and "P5" and layer names as values. A somewhat sensible backbone
to use in many cases is the:
`keras_cv.models.ResNetBackbone.from_preset("resnet50_imagenet")`
- __anchor_generator__: (Optional) a `keras_cv.layers.AnchorGenerator`. If
provided, the anchor generator will be passed to both the
`label_encoder` and the `prediction_decoder`. Only to be used when
both `label_encoder` and `prediction_decoder` are both `None`.
Defaults to an anchor generator with the parameterization:
`strides=[2**i for i in range(3, 8)]`,
`scales=[2**x for x in [0, 1 / 3, 2 / 3]]`,
`sizes=[32.0, 64.0, 128.0, 256.0, 512.0]`,
and `aspect_ratios=[0.5, 1.0, 2.0]`.
- __label_encoder__: (Optional) a keras.Layer that accepts an image Tensor, a
bounding box Tensor and a bounding box class Tensor to its `call()`
method, and returns RetinaNet training targets. By default, a
KerasCV standard `RetinaNetLabelEncoder` is created and used.
Results of this object's `call()` method are passed to the `loss`
object for `box_loss` and `classification_loss` the `y_true`
argument.
- __prediction_decoder__: (Optional) A `keras.layers.Layer` that is
responsible for transforming RetinaNet predictions into usable
bounding box Tensors. If not provided, a default is provided. The
default `prediction_decoder` layer is a
`keras_cv.layers.MultiClassNonMaxSuppression` layer, which uses
a Non-Max Suppression for box pruning.
- __feature_pyramid__: (Optional) A `keras.layers.Layer` that produces
a list of 4D feature maps (batch dimension included)
when called on the pyramid-level outputs of the `backbone`.
If not provided, the reference implementation from the paper will be used.
- __classification_head__: (Optional) A `keras.Layer` that performs
classification of the bounding boxes. If not provided, a simple
ConvNet with 3 layers will be used.
- __box_head__: (Optional) A `keras.Layer` that performs regression of the
bounding boxes. If not provided, a simple ConvNet with 3 layers
will be used.
## Example Usage
## Pretrained RetinaNet model
```
object_detector = keras_hub.models.ImageObjectDetector.from_preset(
"retinanet_resnet50_fpn_coco"
)
input_data = np.random.uniform(0, 1, size=(2, 224, 224, 3))
object_detector(input_data)
```
## Fine-tune the pre-trained model
```python3
backbone = keras_hub.models.Backbone.from_preset(
"retinanet_resnet50_fpn_coco"
)
preprocessor = keras_hub.models.RetinaNetObjectDetectorPreprocessor.from_preset(
"retinanet_resnet50_fpn_coco"
)
model = RetinaNetObjectDetector(
backbone=backbone,
num_classes=len(CLASSES),
preprocessor=preprocessor
)
```
## Custom training the model
```python3
image_converter = keras_hub.layers.RetinaNetImageConverter(
scale=1/255
)
preprocessor = keras_hub.models.RetinaNetObjectDetectorPreprocessor(
image_converter=image_converter
)
# Load a pre-trained ResNet50 model.
# This will serve as the base for extracting image features.
image_encoder = keras_hub.models.Backbone.from_preset(
"resnet_50_imagenet"
)
# Build the RetinaNet Feature Pyramid Network (FPN) on top of the ResNet50
# backbone. The FPN creates multi-scale feature maps for better object detection
# at different sizes.
backbone = keras_hub.models.RetinaNetBackbone(
image_encoder=image_encoder,
min_level=3,
max_level=5,
use_p5=False
)
model = RetinaNetObjectDetector(
backbone=backbone,
num_classes=len(CLASSES),
preprocessor=preprocessor
)
```
## Example Usage with Hugging Face URI
## Pretrained RetinaNet model
```
object_detector = keras_hub.models.ImageObjectDetector.from_preset(
"hf://keras/retinanet_resnet50_fpn_coco"
)
input_data = np.random.uniform(0, 1, size=(2, 224, 224, 3))
object_detector(input_data)
```
## Fine-tune the pre-trained model
```python3
backbone = keras_hub.models.Backbone.from_preset(
"hf://keras/retinanet_resnet50_fpn_coco"
)
preprocessor = keras_hub.models.RetinaNetObjectDetectorPreprocessor.from_preset(
"hf://keras/retinanet_resnet50_fpn_coco"
)
model = RetinaNetObjectDetector(
backbone=backbone,
num_classes=len(CLASSES),
preprocessor=preprocessor
)
```
## Custom training the model
```python3
image_converter = keras_hub.layers.RetinaNetImageConverter(
scale=1/255
)
preprocessor = keras_hub.models.RetinaNetObjectDetectorPreprocessor(
image_converter=image_converter
)
# Load a pre-trained ResNet50 model.
# This will serve as the base for extracting image features.
image_encoder = keras_hub.models.Backbone.from_preset(
"resnet_50_imagenet"
)
# Build the RetinaNet Feature Pyramid Network (FPN) on top of the ResNet50
# backbone. The FPN creates multi-scale feature maps for better object detection
# at different sizes.
backbone = keras_hub.models.RetinaNetBackbone(
image_encoder=image_encoder,
min_level=3,
max_level=5,
use_p5=False
)
model = RetinaNetObjectDetector(
backbone=backbone,
num_classes=len(CLASSES),
preprocessor=preprocessor
)
```
|