README / README.md
canergen's picture
Update README.md
ada01ff verified
---
title: README
emoji: 🐨
colorFrom: purple
colorTo: blue
sdk: static
pinned: true
license: bsd-3-clause
short_description: Ensemble of experts for cell-type annotation
thumbnail: >-
https://cdn-uploads.huggingface.co/production/uploads/63d7697f2e397d9f8e30e677/tvABibiml6K2sccfXLybG.png
---
# **popV**
Welcome to the **popV** framework. We provide state-of-the-art performance in cell-type label transfer using an ensemble of experts approach. We provide here pre-trained
models to transfer cell-types to your own query dataset. Cell-type definition is a tedious process. Using reference data can significantly accelerate this process.
By using several tools for label transfer, we provide a certainty score that is well calibrated and allows to detect cell-types, where automatic annotation has high
uncertainty. We recommend to manually check transferred cell-type labels by plotting marker or differentially expressed genes before blindly trusting them.
This is an open science initiative, please contribute your own models to allow the single-cell community to leverage your reference datasets by asking in our [GitHub
repository](https://github.com/YosefLab/popV) to add your dataset.
---
## **Model Overview**
popV trains up to 9 different algorithms for automatic label transfer and computes a consensus score. We provide an automatic report. To learn how to apply popV to your
own dataset, please refer to our [tutorial]()
### Algorithms
Currently implemented algorithms are:
- K-nearest neighbor classification after dataset integration with [BBKNN](https://github.com/Teichlab/bbknn)
- K-nearest neighbor classification after dataset integration with [SCANORAMA](https://github.com/brianhie/scanorama)
- K-nearest neighbor classification after dataset integration with [scVI](https://github.com/scverse/scvi-tools)
- K-nearest neighbor classification after dataset integration with [Harmony](https://github.com/lilab-bcb/harmony-pytorch)
- Random forest classification
- Support vector machine classification
- [OnClass](https://github.com/wangshenguiuc/OnClass) cell type classification
- [scANVI](https://github.com/scverse/scvi-tools) label transfer
- [Celltypist](https://www.celltypist.org) cell type classification
---
## **Key Applications**
The purpose of these models is to perform cell-type label transfer.
We provide models with (CUML support)[collection] for large-scale reference mapping and (without CUML support)[collection] if no GPU is available. PopV without GPU scales
well to 100k cells. PopV has three levels of prediction complexities:
- retrain will train all classifiers from scratch. For 50k cells this takes up to an hour of computing time using a GPU.
- inference will use pretrained classifiers to annotate query as well as reference cells and construct a joint embedding using all integration methods from above. For 50k cells this takes in our hands up to half an hour of computing time using a GPU.
- fast will use only methods with pretrained classifiers to annotate only query cells. For 50k cells this takes 5 minutes without a GPU (without UMAP embedding).
---
## **Publications**
- **[Original popV paper](https://www.nature.com/articles/s41588-024-01993-3)**:
- Published in *Nature Genetics*, this paper introduces popV and benchmarks it.
## **Contact**
- GitHub: [https://github.com/YosefLab/popV](https://github.com/YosefLab/popV)
- User questions: [Discourse](https://discourse.scverse.org)
<!---
- **[MultiVI](https://docs.scvi-tools.org/en/stable/user_guide/models/multivi.html)**:
- A multi-modal model for joint analysis of RNA, ATAC and protein data, enabling integrative insights from diverse omics data.
-->