|
--- |
|
library_name: keras-hub |
|
--- |
|
### Model Overview |
|
A Keras model implementing the RetinaNet meta-architecture. |
|
|
|
Implements the RetinaNet architecture for object detection. The constructor |
|
requires `num_classes`, `bounding_box_format`, and a backbone. Optionally, |
|
a custom label encoder, and prediction decoder may be provided. |
|
|
|
|
|
__Arguments__ |
|
|
|
|
|
- __num_classes__: the number of classes in your dataset excluding the |
|
background class. Classes should be represented by integers in the |
|
range [0, num_classes). |
|
- __bounding_box_format__: The format of bounding boxes of input dataset. |
|
Refer |
|
[to the keras.io docs](https://keras.io/api/keras_cv/bounding_box/formats/) |
|
for more details on supported bounding box formats. |
|
- __backbone__: `keras.Model`. If the default `feature_pyramid` is used, |
|
must implement the `pyramid_level_inputs` property with keys "P3", "P4", |
|
and "P5" and layer names as values. A somewhat sensible backbone |
|
to use in many cases is the: |
|
`keras_cv.models.ResNetBackbone.from_preset("resnet50_imagenet")` |
|
- __anchor_generator__: (Optional) a `keras_cv.layers.AnchorGenerator`. If |
|
provided, the anchor generator will be passed to both the |
|
`label_encoder` and the `prediction_decoder`. Only to be used when |
|
both `label_encoder` and `prediction_decoder` are both `None`. |
|
Defaults to an anchor generator with the parameterization: |
|
`strides=[2**i for i in range(3, 8)]`, |
|
`scales=[2**x for x in [0, 1 / 3, 2 / 3]]`, |
|
`sizes=[32.0, 64.0, 128.0, 256.0, 512.0]`, |
|
and `aspect_ratios=[0.5, 1.0, 2.0]`. |
|
- __label_encoder__: (Optional) a keras.Layer that accepts an image Tensor, a |
|
bounding box Tensor and a bounding box class Tensor to its `call()` |
|
method, and returns RetinaNet training targets. By default, a |
|
KerasCV standard `RetinaNetLabelEncoder` is created and used. |
|
Results of this object's `call()` method are passed to the `loss` |
|
object for `box_loss` and `classification_loss` the `y_true` |
|
argument. |
|
- __prediction_decoder__: (Optional) A `keras.layers.Layer` that is |
|
responsible for transforming RetinaNet predictions into usable |
|
bounding box Tensors. If not provided, a default is provided. The |
|
default `prediction_decoder` layer is a |
|
`keras_cv.layers.MultiClassNonMaxSuppression` layer, which uses |
|
a Non-Max Suppression for box pruning. |
|
- __feature_pyramid__: (Optional) A `keras.layers.Layer` that produces |
|
a list of 4D feature maps (batch dimension included) |
|
when called on the pyramid-level outputs of the `backbone`. |
|
If not provided, the reference implementation from the paper will be used. |
|
- __classification_head__: (Optional) A `keras.Layer` that performs |
|
classification of the bounding boxes. If not provided, a simple |
|
ConvNet with 3 layers will be used. |
|
- __box_head__: (Optional) A `keras.Layer` that performs regression of the |
|
bounding boxes. If not provided, a simple ConvNet with 3 layers |
|
will be used. |
|
|
|
## Example Usage |
|
## Pretrained RetinaNet model |
|
``` |
|
object_detector = keras_hub.models.ImageObjectDetector.from_preset( |
|
"retinanet_resnet50_fpn_coco" |
|
) |
|
|
|
input_data = np.random.uniform(0, 1, size=(2, 224, 224, 3)) |
|
object_detector(input_data) |
|
``` |
|
|
|
## Fine-tune the pre-trained model |
|
```python3 |
|
backbone = keras_hub.models.Backbone.from_preset( |
|
"retinanet_resnet50_fpn_coco" |
|
) |
|
preprocessor = keras_hub.models.RetinaNetObjectDetectorPreprocessor.from_preset( |
|
"retinanet_resnet50_fpn_coco" |
|
) |
|
model = RetinaNetObjectDetector( |
|
backbone=backbone, |
|
num_classes=len(CLASSES), |
|
preprocessor=preprocessor |
|
) |
|
``` |
|
|
|
## Custom training the model |
|
```python3 |
|
image_converter = keras_hub.layers.RetinaNetImageConverter( |
|
scale=1/255 |
|
) |
|
|
|
preprocessor = keras_hub.models.RetinaNetObjectDetectorPreprocessor( |
|
image_converter=image_converter |
|
) |
|
# Load a pre-trained ResNet50 model. |
|
# This will serve as the base for extracting image features. |
|
image_encoder = keras_hub.models.Backbone.from_preset( |
|
"resnet_50_imagenet" |
|
) |
|
|
|
# Build the RetinaNet Feature Pyramid Network (FPN) on top of the ResNet50 |
|
# backbone. The FPN creates multi-scale feature maps for better object detection |
|
# at different sizes. |
|
backbone = keras_hub.models.RetinaNetBackbone( |
|
image_encoder=image_encoder, |
|
min_level=3, |
|
max_level=5, |
|
use_p5=False |
|
) |
|
model = RetinaNetObjectDetector( |
|
backbone=backbone, |
|
num_classes=len(CLASSES), |
|
preprocessor=preprocessor |
|
) |
|
``` |
|
|
|
## Example Usage with Hugging Face URI |
|
|
|
## Pretrained RetinaNet model |
|
``` |
|
object_detector = keras_hub.models.ImageObjectDetector.from_preset( |
|
"hf://keras/retinanet_resnet50_fpn_coco" |
|
) |
|
|
|
input_data = np.random.uniform(0, 1, size=(2, 224, 224, 3)) |
|
object_detector(input_data) |
|
``` |
|
|
|
## Fine-tune the pre-trained model |
|
```python3 |
|
backbone = keras_hub.models.Backbone.from_preset( |
|
"hf://keras/retinanet_resnet50_fpn_coco" |
|
) |
|
preprocessor = keras_hub.models.RetinaNetObjectDetectorPreprocessor.from_preset( |
|
"hf://keras/retinanet_resnet50_fpn_coco" |
|
) |
|
model = RetinaNetObjectDetector( |
|
backbone=backbone, |
|
num_classes=len(CLASSES), |
|
preprocessor=preprocessor |
|
) |
|
``` |
|
|
|
## Custom training the model |
|
```python3 |
|
image_converter = keras_hub.layers.RetinaNetImageConverter( |
|
scale=1/255 |
|
) |
|
|
|
preprocessor = keras_hub.models.RetinaNetObjectDetectorPreprocessor( |
|
image_converter=image_converter |
|
) |
|
# Load a pre-trained ResNet50 model. |
|
# This will serve as the base for extracting image features. |
|
image_encoder = keras_hub.models.Backbone.from_preset( |
|
"resnet_50_imagenet" |
|
) |
|
|
|
# Build the RetinaNet Feature Pyramid Network (FPN) on top of the ResNet50 |
|
# backbone. The FPN creates multi-scale feature maps for better object detection |
|
# at different sizes. |
|
backbone = keras_hub.models.RetinaNetBackbone( |
|
image_encoder=image_encoder, |
|
min_level=3, |
|
max_level=5, |
|
use_p5=False |
|
) |
|
model = RetinaNetObjectDetector( |
|
backbone=backbone, |
|
num_classes=len(CLASSES), |
|
preprocessor=preprocessor |
|
) |
|
``` |
|
|