arxiv:2408.11227

OCTCube-M: A 3D multimodal optical coherence tomography foundation model for retinal and systemic diseases with cross-cohort and cross-device validation

Published on Aug 20, 2024

Authors:

Abstract

We present <PRE_TAG>OCTCube-M</POST_TAG>, a 3D OCT-based multi-modal foundation model for jointly analyzing OCT and en face images. <PRE_TAG>OCTCube-M</POST_TAG> first developed OCTCube, a 3D foundation model pre-trained on 26,685 3D OCT volumes encompassing 1.62 million 2D OCT images. It then exploits a novel multi-modal contrastive learning framework COEP to integrate other retinal imaging modalities, such as fundus autofluorescence and infrared retinal imaging, into OCTCube, efficiently extending it into multi-modal foundation models. OCTCube achieves best performance on predicting 8 retinal diseases, demonstrating strong generalizability on cross-cohort, cross-device and cross-modality prediction. OCTCube can also predict cross-organ nodule malignancy (CT) and low cardiac ejection fraction as well as systemic diseases, such as diabetes and hypertension, revealing its wide applicability beyond retinal diseases. We further develop <PRE_TAG>OCTCube-IR</POST_TAG> using COEP with 26,685 OCT and IR image pairs. <PRE_TAG>OCTCube-IR</POST_TAG> can accurately retrieve between OCT and IR images, allowing joint analysis between 3D and 2D retinal imaging modalities. Finally, we trained a tri-modal foundation model <PRE_TAG>OCTCube-EF</POST_TAG> from 4 million <PRE_TAG>2D OCT images</POST_TAG> and 400K en face retinal images. <PRE_TAG>OCTCube-EF</POST_TAG> attains the best performance on predicting the growth rate of geographic atrophy (GA) across datasets collected from 6 multi-center global trials conducted in 23 countries. This improvement is statistically equivalent to running a clinical trial with more than double the size of the original study. Our analysis based on another retrospective case study reveals <PRE_TAG>OCTCube-EF</POST_TAG>'s ability to avoid false positive Phase-III results according to its accurate treatment effect estimation on the Phase-II results. In sum, <PRE_TAG>OCTCube-M</POST_TAG> is a 3D multi-modal foundation model framework that integrates OCT and other retinal imaging modalities revealing substantial diagnostic and prognostic benefits.