You need to agree to share your contact information to access this model
This repository is publicly accessible, but you have to accept the conditions to access its files and content.
The Patho-CLIP-B model and its associated materials are released under the CC-BY-NC-ND 4.0 license. Access is restricted to non-commercial, academic research purposes only, with proper citation required. Any commercial usage, redistribution, or derivative work (including training models based on this model or generating datasets from its outputs) is strictly prohibited without prior written approval.
Users must register with an official institutional email address (generic domains such as @gmail, @qq, @hotmail, etc. will not be accepted). By requesting access, you confirm that your information is accurate and current, and that you agree to comply with all terms listed herein. If other members of your organization wish to use the model, they must register independently and agree to the same terms.
Log in or Sign Up to review the conditions and access this model content.
Introduction📝
To bridge the gap between fine-grained tissue morphology and clinical semantic understanding in pathology, we present Patho-CLIP-B, a vision-language model tailored for high-resolution cross-modal representation learning in pathological diagnosis.
Patho-CLIP-B is built on the OpenAI-CLIP-B architecture and trained through a two-stage progressive paradigm:
Stage I: Contrastive pretraining on PathGen-1.6M, focusing on cell morphology and tissue organization to embed high-resolution visual priors
Stage II: Joint training on a 3.5M composite corpus comprising PathGen-1.6M, Quilt-1M, PathCap, and a textbook-derived dataset, to integrate domain-specific semantics with morphological features
This strategy enables Patho-CLIP-B to achieve strong performance in semantic alignment, cross-modal retrieval, and tissue-level discrimination, offering a robust foundation for downstream pathology tasks.
Acknowledgements🎖
We gratefully acknowledge the OpenCLIP project for providing an efficient and extensible implementation of CLIP models. Its flexible training pipeline, model support, and strong community contributions significantly facilitated the development and training of our Patho-CLIP-B model.
We thank the authors and maintainers for their excellent work.
Citation❤️
If you find our work helpful, a citation would be greatly appreciated:
@article{zhang2025patho,
title={Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner},
author={Zhang, Wenchuan and Zhang, Penghao and Guo, Jingru and Cheng, Tao and Chen, Jie and Zhang, Shuwan and Zhang, Zhang and Yi, Yuhao and Bu, Hong},
journal={arXiv preprint arXiv:2505.11404},
year={2025}
}