File size: 5,962 Bytes

c866ac3
 
 
e8cfd5c

---
library_name: keras-hub
---
### Model Overview
A Keras model implementing the RetinaNet meta-architecture.

Implements the RetinaNet architecture for object detection. The constructor
requires `num_classes`, `bounding_box_format`, and a backbone. Optionally,
a custom label encoder, and prediction decoder may be provided.


__Arguments__


- __num_classes__: the number of classes in your dataset excluding the
    background class. Classes should be represented by integers in the
    range [0, num_classes).
- __bounding_box_format__: The format of bounding boxes of input dataset.
    Refer
    [to the keras.io docs](https://keras.io/api/keras_cv/bounding_box/formats/)
    for more details on supported bounding box formats.
- __backbone__: `keras.Model`. If the default `feature_pyramid` is used,
    must implement the `pyramid_level_inputs` property with keys "P3", "P4",
    and "P5" and layer names as values. A somewhat sensible backbone
    to use in many cases is the:
    `keras_cv.models.ResNetBackbone.from_preset("resnet50_imagenet")`
- __anchor_generator__: (Optional) a `keras_cv.layers.AnchorGenerator`. If
    provided, the anchor generator will be passed to both the
    `label_encoder` and the `prediction_decoder`. Only to be used when
    both `label_encoder` and `prediction_decoder` are both `None`.
    Defaults to an anchor generator with the parameterization:
    `strides=[2**i for i in range(3, 8)]`,
    `scales=[2**x for x in [0, 1 / 3, 2 / 3]]`,
    `sizes=[32.0, 64.0, 128.0, 256.0, 512.0]`,
    and `aspect_ratios=[0.5, 1.0, 2.0]`.
- __label_encoder__: (Optional) a keras.Layer that accepts an image Tensor, a
    bounding box Tensor and a bounding box class Tensor to its `call()`
    method, and returns RetinaNet training targets. By default, a
    KerasCV standard `RetinaNetLabelEncoder` is created and used.
    Results of this object's `call()` method are passed to the `loss`
    object for `box_loss` and `classification_loss` the `y_true`
    argument.
- __prediction_decoder__: (Optional)  A `keras.layers.Layer` that is
    responsible for transforming RetinaNet predictions into usable
    bounding box Tensors. If not provided, a default is provided. The
    default `prediction_decoder` layer is a
    `keras_cv.layers.MultiClassNonMaxSuppression` layer, which uses
    a Non-Max Suppression for box pruning.
- __feature_pyramid__: (Optional) A `keras.layers.Layer` that produces
    a list of 4D feature maps (batch dimension included)
    when called on the pyramid-level outputs of the `backbone`.
    If not provided, the reference implementation from the paper will be used.
- __classification_head__: (Optional) A `keras.Layer` that performs
    classification of the bounding boxes. If not provided, a simple
    ConvNet with 3 layers will be used.
- __box_head__: (Optional) A `keras.Layer` that performs regression of the
    bounding boxes. If not provided, a simple ConvNet with 3 layers
    will be used.

## Example Usage
## Pretrained RetinaNet model
```
object_detector = keras_hub.models.ImageObjectDetector.from_preset(
    "retinanet_resnet50_fpn_coco"
)

input_data = np.random.uniform(0, 1, size=(2, 224, 224, 3))
object_detector(input_data)
```

## Fine-tune the pre-trained model
```python3
backbone = keras_hub.models.Backbone.from_preset(
    "retinanet_resnet50_fpn_coco"
)
preprocessor = keras_hub.models.RetinaNetObjectDetectorPreprocessor.from_preset(
    "retinanet_resnet50_fpn_coco"
)
model = RetinaNetObjectDetector(
    backbone=backbone,
    num_classes=len(CLASSES),
    preprocessor=preprocessor
)
```

## Custom training the model
```python3
image_converter = keras_hub.layers.RetinaNetImageConverter(
    scale=1/255
)

preprocessor = keras_hub.models.RetinaNetObjectDetectorPreprocessor(
    image_converter=image_converter
)
# Load a pre-trained ResNet50 model. 
# This will serve as the base for extracting image features.
image_encoder = keras_hub.models.Backbone.from_preset(
    "resnet_50_imagenet" 
)

# Build the RetinaNet Feature Pyramid Network (FPN) on top of the ResNet50 
# backbone. The FPN creates multi-scale feature maps for better object detection
# at different sizes.
backbone = keras_hub.models.RetinaNetBackbone(
    image_encoder=image_encoder,
    min_level=3,
    max_level=5,
    use_p5=False 
)
model = RetinaNetObjectDetector(
    backbone=backbone,
    num_classes=len(CLASSES),
    preprocessor=preprocessor
)
```

## Example Usage with Hugging Face URI

## Pretrained RetinaNet model
```
object_detector = keras_hub.models.ImageObjectDetector.from_preset(
    "hf://keras/retinanet_resnet50_fpn_coco"
)

input_data = np.random.uniform(0, 1, size=(2, 224, 224, 3))
object_detector(input_data)
```

## Fine-tune the pre-trained model
```python3
backbone = keras_hub.models.Backbone.from_preset(
    "hf://keras/retinanet_resnet50_fpn_coco"
)
preprocessor = keras_hub.models.RetinaNetObjectDetectorPreprocessor.from_preset(
    "hf://keras/retinanet_resnet50_fpn_coco"
)
model = RetinaNetObjectDetector(
    backbone=backbone,
    num_classes=len(CLASSES),
    preprocessor=preprocessor
)
```

## Custom training the model
```python3
image_converter = keras_hub.layers.RetinaNetImageConverter(
    scale=1/255
)

preprocessor = keras_hub.models.RetinaNetObjectDetectorPreprocessor(
    image_converter=image_converter
)
# Load a pre-trained ResNet50 model. 
# This will serve as the base for extracting image features.
image_encoder = keras_hub.models.Backbone.from_preset(
    "resnet_50_imagenet" 
)

# Build the RetinaNet Feature Pyramid Network (FPN) on top of the ResNet50 
# backbone. The FPN creates multi-scale feature maps for better object detection
# at different sizes.
backbone = keras_hub.models.RetinaNetBackbone(
    image_encoder=image_encoder,
    min_level=3,
    max_level=5,
    use_p5=False 
)
model = RetinaNetObjectDetector(
    backbone=backbone,
    num_classes=len(CLASSES),
    preprocessor=preprocessor
)
```