ctheodoris/Geneformer · Different N_Detections for OE or deletion of same gene with same dataset

ag2022

Feb 6

Hello, I'm a bit confused by the output of InSilicoPerturberStats in "aggregate gene shifts" mode. I'm looking at in silico perturbations in the same dataset for over expression or deletion of the same gene. My understanding was that N_detections was number of cells in the input dataset that the affected gene was observed in, is that correct? (looking here: https://geneformer.readthedocs.io/en/latest/geneformer.in_silico_perturber_stats.html)

When I run deletion mode it completes very quickly relative to overexpress mode and then the N_Detections is very low (max 67) and not really representative of my dataset. The results from overexpress mode look more accurate to me in terms of expected gene expression (max N_Detections 16504). I know I'm running the exact same dataset because I define it once at the beginning of my script and then run both modes for the same gene underneath.

Any thoughts on whats going on? Thank you!!

ispstats = InSilicoPerturberStats(mode="aggregate_gene_shifts", genes_perturbed=[gene_id], combos=0,anchor_gene=None)

ag2022

Feb 7

Ohh wait, is it the number of cells that the affected gene is ALTERED in (so if my perturb gene is lowly expressed, then theres only a few cells it can be deleted from, capping the number of detections where any other gene is altered?)

ctheodoris

Owner Feb 7

Yes the deletion can only delete the gene in the cells in which it is detected. The overexpression can add the gene to all cells.

ctheodoris changed discussion status to closed Feb 7