Model Overview

VISTA3D is trained using over 20 partial datasets with more complicated processing. This model is a hugging face refactored version of the MONAI VISTA3D bundle. A pipeline with transformer library interfaces is provided by this model. For more details about the original model, please visit the MONAI model zoo.

Run pipeline:

For running the pipeline, VISTA3d requires at least one prompt for segmentation. It supports label prompt, which is the index of the class for automatic segmentation. It also supports point-click prompts for binary interactive segmentation. Users can provide both prompts at the same time.

Here is a code snippet to showcase how to execute inference with this model.

import os
import tempfile

import torch
from hugging_face_pipeline import HuggingFacePipelineHelper


FILE_PATH = os.path.dirname(__file__)
with tempfile.TemporaryDirectory() as tmp_dir:
    output_dir = os.path.join(tmp_dir, "output_dir")
    pipeline_helper = HuggingFacePipelineHelper("vista3d")
    pipeline = pipeline_helper.init_pipeline(
        os.path.join(FILE_PATH, "vista3d_pretrained_model"),
        device=torch.device("cuda:0"),
    )
    inputs = [
        {
            "image": "/data/Task09_Spleen/imagesTs/spleen_1.nii.gz",
            "label_prompt": [3],
        },
        {
            "image": "/data/Task09_Spleen/imagesTs/spleen_11.nii.gz",
            "label_prompt": [3],
        },
    ]
    pipeline(inputs, output_dir=output_dir)

The inputs defines the image to segment and the prompt for segmentation.

inputs = {'image': '/data/Task09_Spleen/imagesTs/spleen_15.nii.gz', 'label_prompt':[1]}
inputs =  {'image': '/data/Task09_Spleen/imagesTs/spleen_15.nii.gz', 'points':[[138,245,18], [271,343,27]], 'point_labels':[1,0]}

The inputs must include the key image which contain the absolute path to the nii image file, and includes prompt keys of label_prompt, points and point_labels.
The label_prompt is a list of length B, which can perform B foreground objects segmentation, e.g. [2,3,4,5]. If B>1, Point prompts must NOT be provided.
The points is of shape [N, 3] like [[x1,y1,z1],[x2,y2,z2],...[xN,yN,zN]], representing N point coordinates IN THE ORIGINAL IMAGE SPACE of a single foreground object. point_labels is a list of length [N] like [1,1,0,-1,...], which matches the points. 0 means background, 1 means foreground, -1 means ignoring this point. points and point_labels must pe provided together and match length.
B must be 1 if label_prompt and points are provided together. The inferer only supports SINGLE OBJECT point click segmentatation.
If no prompt is provided, the model will use everything_labels to segment 117 classes:

list(set([i+1 for i in range(132)]) - set([2,16,18,20,21,23,24,25,26,27,128,129,130,131,132]))

The points together with label_prompts for "Kidney", "Lung", "Bone" (class index [2, 20, 21]) are not allowed since those prompts will be divided into sub-categories (e.g. left kidney and right kidney). Use points for the sub-categories as defined in the inference.json.
To specify a new class for zero-shot segmentation, set the label_prompt to a value between 133 and 254. Ensure that points and point_labels are also provided; otherwise, the inference result will be a tensor of zeros.

References

Antonelli, M., Reinke, A., Bakas, S. et al. The Medical Segmentation Decathlon. Nat Commun 13, 4128 (2022). https://doi.org/10.1038/s41467-022-30695-9
VISTA3D: Versatile Imaging SegmenTation and Annotation model for 3D Computed Tomography. arxiv (2024) https://arxiv.org/abs/2406.05285

License

Code License

This project includes code licensed under the Apache License 2.0. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Model Weights License

The model weights included in this project are licensed under the NCLS v1 License.

Both licenses' full texts have been combined into a single LICENSE file. Please refer to this LICENSE file for more details about the terms and conditions of both licenses.