metadata

tags:
  - image-feature-extraction
  - timm
  - pathology
  - histology
  - medical imaging
  - self-supervised learning
  - vision transformer
  - foundation model
library_name: timm
license: cc-by-nc-nd-4.0
extra_gated_prompt: >-
  - This model and associated code are released under the CC-BY-NC-ND 4.0
  license and may only be used for non-commercial, academic research purposes
  with proper attribution. 

  - Any commercial use, sale, or other monetization of the H0-mini model and its
  derivatives, which include models trained on outputs from the H0-mini model or
  datasets created from the H0-mini model, is prohibited and requires prior
  approval. 

  - Please note that the primary email used to sign up for your Hugging Face
  account must match your institutional email to receive approval. By
  downloading the model, you attest that all information (affiliation, research
  use) is correct and up-to-date. Downloading the model requires prior
  registration on Hugging Face and agreeing to the terms of use. By downloading
  this model, you agree not to distribute, publish or reproduce a copy of the
  model. If another user within your organization wishes to use the H0-mini
  model, they must register as an individual user and agree to comply with the
  terms of use. Users may not attempt to re-identify the deidentified data used
  to develop the underlying model. 

  - This model is provided “as-is” without warranties of any kind, express or
  implied. This model has not been reviewed, certified, or approved by any
  regulatory body, including but not limited to the FDA (U.S.), EMA (Europe),
  MHRA (UK), or other medical device authorities. Any application of this model
  in healthcare or biomedical settings must comply with relevant regulatory
  requirements and undergo independent validation. Users assume full
  responsibility for how they use this model and any resulting consequences. The
  authors, contributors, and distributors disclaim any liability for damages,
  direct or indirect, resulting from model use. Users are responsible for
  ensuring compliance with data protection regulations (e.g., GDPR, HIPAA) when
  using it in research that involves patient data.

  - If you are a commercial entity, please contact us at hello [at]
  bioptimus.com to discuss licensing options.
extra_gated_fields:
  Full name (first and last): text
  Current affiliation (no abbreviations): text
  Type of Affiliation:
    type: select
    options:
      - Academia
      - Industry
      - label: Other
        value: other
  Current and official institutional email (**this must match your primary email in your Hugging Face account, @gmail/@hotmail/@qq email domains will be denied**): text
  Main use-case:
    type: select
    options:
      - Models benchmarking on various tasks
      - Biomarker Discovery
      - Diagnostics
      - Pathology workflows acceleration (cell & tissue segmentation etc)
      - label: Other
        value: other
  Please add information on your intended research use: text
  I agree to all terms outlined above: checkbox
  I agree not to distribute the model, if another user within your organization wishes to use the H0-mini model, they must register as an individual user: checkbox
  I agree to use this model for non-commercial, academic purposes only: checkbox
  I am interested by receiving updates from Bioptimus:
    type: checkbox
    optional: true

Model card for H0-mini

H0-mini is a lightweight foundation model for histology developed by Owkin and Bioptimus.

The model is a Vision Transformer Base/14 distilled from H-optimus-0 [1] (ViT-g/14) with DINOv2 [2] self-supervised distillation method on PanCancer40M, a set of 43 million histology tiles extracted from 6,093 histology slides of TCGA.

H0-mini achieves comparable performance to current histology foundation models at a significantly reduced inference cost. It also demonstrates strong robustness to variations in staining and scanning protocols. Please refer to the ArXiv preprint for additional details.

Figure: Assessment of model robustness to staining and scanning conditions in PLISM dataset [3] - Median top-10 accuracy vs. mean cosine similarity was computed for each extractor over 4,095 slide pairs. For both axes, higher values indicate more robust models.

How to use it to extract features.

H0-mini can be used with or without fine-tuning on different downstream applications, such as slide-level classification using multiple-instance learning algorithms (e.g. using ABMIL [4]).

The following code snippet allows you to extract features from histology images using H0-Mini.

We recommend to use the CLS token (cls_features) as input features for downstream tasks. The concatenation of the CLS token features with the average of patch token features may bring some improvements on some tasks (concatenated_features).

from huggingface_hub import login
import torch
import timm
from timm.data import resolve_data_config
from timm.data.transforms_factory import create_transform
from torchvision import transforms


# Login to the Hugging Face hub, using your user access token that can be found here:
# https://huggingface.co/settings/tokens.
login()

model = timm.create_model(
    "hf-hub:bioptimus/H0-mini",
    pretrained=True,
    mlp_layer=timm.layers.SwiGLUPacked,
    act_layer=torch.nn.SiLU,
)
model.to("cuda")
model.eval()

transform = create_transform(**resolve_data_config(model.pretrained_cfg, model=model))

input = torch.rand(3, 224, 224)
input = transforms.ToPILImage()(input)

# We recommend using mixed precision for faster inference.
with torch.autocast(device_type="cuda", dtype=torch.float16):
    with torch.inference_mode():
        output = model(transform(input).unsqueeze(0).to("cuda"))  # (1, 261, 768)
        # CLS token features (1, 768):
        cls_features = output[:, 0]
        # Patch token features (1, 256, 768):
        patch_token_features = output[:, model.num_prefix_tokens :]
        # Concatenate the CLS token features with the mean of the patch token
        # features (1, 1536):
        concatenated_features = torch.cat(
            [cls_features, patch_token_features.mean(1)], dim=-1
        )

assert cls_features.shape == (1, 768)
assert patch_token_features.shape == (1, 256, 768)
assert concatenated_features.shape == (1, 1536)

These features can then be used for downstream applications such as ROI classification (via linear or k-NN probing), slide classification (via multiple instance learning), segmentation (via ViT-Adapter for instance), etc.

Software Dependencies.

torch>==2.0.0: https://pytorch.org
torchvision>=0.15.0: https://pytorch.org/vision/stable/index.html
xformers>=0.0.18: https://github.com/facebookresearch/xformers

Citation.

If you are using this model, please cite our work:

@misc{filiot2025distillingfoundationmodelsrobust,
      title={Distilling foundation models for robust and efficient models in digital pathology}, 
      author={Alexandre Filiot and Nicolas Dop and Oussama Tchita and Auriane Riou and Thomas Peeters and Daria Valter and Marin Scalbert and Charlie Saillard and Geneviève Robin and Antoine Olivier},
      year={2025},
      eprint={2501.16239},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2501.16239}, 
}

Acknowledgements.

Computing resources.

This work was granted access to the High-Performance Computing (HPC) resources of IDRIS under the allocations 2023-A0141012519, 2024-A0161012519 and 2024-GC011015442 made by GENCI.

Code.

H0-mini was built upon DINOv2 repository (Apache License 2.0).

Datasets.

The results published here are partly based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga.

References

Saillard, C., Jenatton, R., Llinares-López, F., Mariet, Z., Cahané, D., Durand, E., Vert, J.-P., 2024. H-optimus-0.
Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., ... & Bojanowski, P. (2023). Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193.
Ochi, M., Komura, D., Onoyama, T., Shinbo, K., Endo, H., Odaka, H., ... & Ishikawa, S. (2024). Registered multi-device/staining histology image dataset for domain-agnostic machine learning models. Scientific Data, 11(1), 330.
Ilse, M., Tomczak, J., & Welling, M. (2018, July). Attention-based deep multiple instance learning. In International conference on machine learning (pp. 2127-2136). PMLR.