Embedding exploration
This folder contains all the data and code needed to run embedding exploration (Fig. S3).
Data download
To help select TF (transcription factor) and Kinase-containing fusions for investigation (Fig. S3a), Supplementary Table 3 from Salokas et al. 2020 was downloaded as a reference of transcription factors and kinases.
benchmarking/
βββ embedding_exploration/
βββ data/
βββ salokas_2020_tableS3.csv
βββ tf_and_kinase_fusions.csv
βββ top_genes.csv
data/salokas_2020_tableS3.csv
: Supplementary Table 3 from Salokas et al. 2020data/tf_and_kinase_fusions.csv
: set of TF::TF and Kinase::Kinase fusion oncoproteins from FusOn-DB database. Curated inplot.py
data/top_genes.csv
: fusion oncoproteins (and their head and tail components) visualized in Fig. S3b. Sequences for head and tail components were pulled from the best-aligned sequences infuson_plm/data/blast/blast_outputs/best_htg_alignments_swissprot_seqs.pkl
Plotting
Run plot.py
to regenerate plots in Figure S3:
# Dictionary: key = run name, values = epochs. (use this option if you've trained your own model)
# # Or "FusOn-pLM" to use official model
FUSON_PLM_CKPT= "FusOn-pLM"
# Type of dim reduction
PLOT_UMAP = True
PLOT_TSNE = False
# Overwriting configs
PERMISSION_TO_OVERWRITE = False # if False, script will halt if it believes these embeddings have already been made.
To run, use:
nohup python plot.py > plot.out 2> plot.err &
- All results are stored in
embedding_exploration/results/<timestamp>
, wheretimestamp
is a unique string encoding the date and time when you started training.
Below are the FusOn-pLM paper results in results/final/umap_plots/fuson_plm/best/
:
benchmarking/
βββ embedding_exploration/
βββ results/final/umap_plots/fuson_plm/best/
βββ favorites/
βββ umap_favorites_source_data.csv
βββ umap_favorites_visualization.png
βββ tf_and_kinase/
βββ umap_tf_and_kinase_fusions_source_data.csv βββ umap_tf_and_kinase_fusions_visualization.png
favorites/umap_favorites_visualization.png
: Fig. S3b, with the data directly plotted stored infavorites/umap_favorites_source_data.csv
tf_and_kinase/umap_tf_and_kinase_fusions_visualization.png
: Fig. S3a, with the data directly plotted stored intf_and_kinase/umap_tf_and_kinase_fusions_source_data.csv
.