djordan's picture
Add BERTopic model
5ad2c5f verified
  - bertopic
library_name: bertopic
pipeline_tag: text-classification


This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.


To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("djordan/am25_abstract_topic_model")


Topic overview

  • Number of topics: 171
  • Number of training documents: 7863
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 and - the - of - in - to 5 -1_and_the_of_in
0 adc - adcs - dxd - payload - her2 3431 0_adc_adcs_dxd_payload
1 kras - ras - g12c - mutant - g12d 196 1_kras_ras_g12c_mutant
2 ici - patients - immune - response - responders 179 2_ici_patients_immune_response
3 health - care - women - among - black 165 3_health_care_women_among
4 aml - leukemia - myeloid - venetoclax - acute 150 4_aml_leukemia_myeloid_venetoclax
5 gbm - glioblastoma - brain - glioma - tmz 140 5_gbm_glioblastoma_brain_glioma
6 pd - l1 - anti - ccr8 - antibody 135 6_pd_l1_anti_ccr8
7 parpi - parp - usp1 - dna - repair 100 7_parpi_parp_usp1_dna
8 car - cells - cd19 - antigen - cell 93 8_car_cells_cd19_antigen
9 egfr - osimertinib - resistance - tkis - tki 92 9_egfr_osimertinib_resistance_tkis
10 tnbc - breast - triple - mda - negative 86 10_tnbc_breast_triple_mda
11 pdos - organoids - organoid - drug - 3d 84 11_pdos_organoids_organoid_drug
12 pdac - pancreatic - basal - ductal - classical 77 12_pdac_pancreatic_basal_ductal
13 cfdna - methylation - samples - detection - urine 62 13_cfdna_methylation_samples_detection
14 ar - enzalutamide - prostate - androgen - resistant 60 14_ar_enzalutamide_prostate_androgen
15 dose - pts - mg - safety - pk 58 15_dose_pts_mg_safety
16 glucose - glutamine - mitochondrial - metabolism - metabolic 56 16_glucose_glutamine_mitochondrial_metabolism
17 hcc - liver - sorafenib - hepatocellular - lenvatinib 52 17_hcc_liver_sorafenib_hepatocellular
18 variants - ffpe - sequencing - variant - samples 49 18_variants_ffpe_sequencing_variant
19 microbiome - microbial - bacterial - bacteria - microbiota 49 19_microbiome_microbial_bacterial_bacteria
20 spatial - tissue - imaging - plex - image 48 20_spatial_tissue_imaging_plex
21 sclc - elavl4 - hnf4a - lung - ne 48 21_sclc_elavl4_hnf4a_lung
22 mice - human - humanized - mouse - engraftment 46 22_mice_human_humanized_mouse
23 bone - os - metastasis - osteosarcoma - metastatic 45 23_bone_os_metastasis_osteosarcoma
24 braf - tead - raf - melanoma - mek 44 24_braf_tead_raf_melanoma
25 pca - prostate - psa - gleason - men 44 25_pca_prostate_psa_gleason
26 cldn18 - cldn6 - claudin - cldn1 - cldn3 41 26_cldn18_cldn6_claudin_cldn1
27 psma - 177lu - fap - uptake - 68ga 41 27_psma_177lu_fap_uptake
28 mtap - prmt5 - mta - deleted - cooperative 40 28_mtap_prmt5_mta_deleted
29 mm - myeloma - bone - cd38 - cst6 40 29_mm_myeloma_bone_cd38
30 ecdna - somatic - skin - genome - mutational 37 30_ecdna_somatic_skin_genome
31 incidence - risk - cancer - exposure - lifestyle 37 31_incidence_risk_cancer_exposure
32 tce - cd3 - tces - gd - engagers 36 32_tce_cd3_tces_gd
33 ctdna - mrd - recurrence - patients - months 36 33_ctdna_mrd_recurrence_patients
34 ctcs - ctc - blood - biopsy - v7 35 34_ctcs_ctc_blood_biopsy
35 capsaicin - vialinin - apoptosis - apoptotic - compounds 35 35_capsaicin_vialinin_apoptosis_apoptotic
36 crc - apc - wnt - colonic - intestinal 34 36_crc_apc_wnt_colonic
37 ebv - npc - hpv - nasopharyngeal - hnscc 34 37_ebv_npc_hpv_nasopharyngeal
38 pdac - pancreatic - nets - tme - immunosuppressive 34 38_pdac_pancreatic_nets_tme
39 spatial - resolution - transcriptomics - tissue - xenium 34 39_spatial_resolution_transcriptomics_tissue
40 cafs - caf - fibroblasts - axl - gc 33 40_cafs_caf_fibroblasts_axl
41 test - abstract - text - you - your 33 41_test_abstract_text_you
42 p53 - y220c - ddr - dna - repair 33 42_p53_y220c_ddr_dna
43 data - ai - 500 - datasets - research 33 43_data_ai_500_datasets
44 luad - lung - xage1 - znf687 - lusc 32 44_luad_lung_xage1_znf687
45 variants - brca1 - chek2 - bc - germline 29 45_variants_brca1_chek2_bc
46 sting - agonist - cgas - interferon - activation 29 46_sting_agonist_cgas_interferon
47 pdt - light - elp - nanoparticles - ph 28 47_pdt_light_elp_nanoparticles
48 il - 12 - obp - 702 - tumor 28 48_il_12_obp_702
49 ccrcc - rcc - renal - vhl - carcinoma 28 49_ccrcc_rcc_renal_vhl
50 vaccines - vaccine - neoantigen - mrna - peptides 26 50_vaccines_vaccine_neoantigen_mrna
51 notch4 - dormancy - evs - e7011 - exosomes 26 51_notch4_dormancy_evs_e7011
52 slides - images - model - wsi - slide 26 52_slides_images_model_wsi
53 smarca4 - smarca2 - smarca1 - 3236 - smd 26 53_smarca4_smarca2_smarca1_3236
54 cytof - spectral - cytometry - flow - xt 24 54_cytof_spectral_cytometry_flow
55 mb - medulloblastoma - shh - nesc - tert 23 55_mb_medulloblastoma_shh_nesc
56 cca - cholangiocarcinoma - bile - postn - duct 22 56_cca_cholangiocarcinoma_bile_postn
57 ezh2 - ezh1 - fads2 - prc2 - h3k27me3 22 57_ezh2_ezh1_fads2_prc2
58 wrn - msi - helicase - gsk4418959 - hro761 22 58_wrn_msi_helicase_gsk4418959
59 ews - fli1 - ewing - ewsr1 - sarcoma 22 59_ews_fli1_ewing_ewsr1
60 cdk4 - 6i - resistant - resistance - er 21 60_cdk4_6i_resistant_resistance
61 pdac - gemcitabine - pikfyve - pancreatic - metabolic 21 61_pdac_gemcitabine_pikfyve_pancreatic
62 nb - mycn - neuroblastoma - 17q - gd2 21 62_nb_mycn_neuroblastoma_17q
63 macrophages - m1 - m2 - macrophage - tams 20 63_macrophages_m1_m2_macrophage
64 egfr - bispecific - her3 - cmet - adc 20 64_egfr_bispecific_her3_cmet
65 discovery - drug - library - covalent - hit 20 65_discovery_drug_library_covalent
66 ferroptosis - gpx4 - peroxidation - ferroptotic - lipid 20 66_ferroptosis_gpx4_peroxidation_ferroptotic
67 hnscc - hpv - fst - oscc - cyh33 19 67_hnscc_hpv_fst_oscc
68 drug - predictive - drugs - framework - enlight 19 68_drug_predictive_drugs_framework
69 bcma - car - gprc5d - mm - cel 19 69_bcma_car_gprc5d_mm
70 ackr1 - extravasation - metastatic - niche - endothelial 18 70_ackr1_extravasation_metastatic_niche
71 gut - microbiome - microbiota - fmt - ici 18 71_gut_microbiome_microbiota_fmt
72 cachexia - muscle - senescent - fisetin - gdf15 17 72_cachexia_muscle_senescent_fisetin
73 egfr - nsclc - egfrm - tki - mutations 17 73_egfr_nsclc_egfrm_tki
74 icg - imaging - sln - nir - fluorescence 17 74_icg_imaging_sln_nir
75 e3 - degradation - ligase - protacs - protac 17 75_e3_degradation_ligase_protacs
76 pik3ca - pi3ka - alpelisib - pi3k - mutant 17 76_pik3ca_pi3ka_alpelisib_pi3k
77 oncokb - variants - variant - oncotagger - somatic 17 77_oncokb_variants_variant_oncotagger
78 ffpe - rna - samples - seq - fixed 17 78_ffpe_rna_samples_seq
79 copd - risk - proteins - igfbp7 - mortality 17 79_copd_risk_proteins_igfbp7
80 hpv - opscc - pwh - infection - hiv 16 80_hpv_opscc_pwh_infection
81 dietary - intake - food - risk - plant 16 81_dietary_intake_food_risk
82 er - mcf - endocrine - estrogen - e2 16 82_er_mcf_endocrine_estrogen
83 pkmyt1 - wee1 - ccne1 - lunresertib - cdk1 16 83_pkmyt1_wee1_ccne1_lunresertib
84 lcs - screening - lung - sdm - risk 15 84_lcs_screening_lung_sdm
85 cdh17 - cadherin - 054 - lbl - gi 15 85_cdh17_cadherin_054_lbl
86 rms - fp - foxo1 - p3f - pax3 14 86_rms_fp_foxo1_p3f
87 myc - mycg4 - g4 - nucleolin - ddx5 14 87_myc_mycg4_g4_nucleolin
88 eac - ec - esophageal - pro - rkp 14 88_eac_ec_esophageal_pro
89 hdac3 - hdac - gem144 - hdac8 - hdaci 14 89_hdac3_hdac_gem144_hdac8
90 culture - organoids - immune - co - tios 14 90_culture_organoids_immune_co
91 ttfields - fields - dox - concomitant - electric 13 91_ttfields_fields_dox_concomitant
92 cdk2 - ccne1 - cdk4 - cyclin - amplified 13 92_cdk2_ccne1_cdk4_cyclin
93 cd73 - adenosine - a2ar - cd68 - immune 13 93_cd73_adenosine_a2ar_cd68
94 ilc - cdh1 - tfap2b - breast - lobular 13 94_ilc_cdh1_tfap2b_breast
95 btk - nx - 5948 - lymphoma - c481s 13 95_btk_nx_5948_lymphoma
96 hrd - hrr - biallelic - recombination - homologous 13 96_hrd_hrr_biallelic_recombination
97 runx3 - paint - pkp3 - snord67 - 3q 13 97_runx3_paint_pkp3_snord67
98 kat6a - kat6 - er - kat6b - breast 12 98_kat6a_kat6_er_kat6b
99 lncrnas - coding - ner - uterine - lncrna 12 99_lncrnas_coding_ner_uterine
100 bca - numb - bladder - rock - muscle 12 100_bca_numb_bladder_rock
101 vaccination - hpv - vaccine - hesitancy - covid 12 101_vaccination_hpv_vaccine_hesitancy
102 blca - bladder - fgfr3 - mibc - nmibc 12 102_blca_bladder_fgfr3_mibc
103 lnp - lnps - formulation - dsrna - lipid 12 103_lnp_lnps_formulation_dsrna
104 germline - variants - pathogenic - ddx41 - read 11 104_germline_variants_pathogenic_ddx41
105 age - aged - aging - young - mice 11 105_age_aged_aging_young
106 pdx - hci - models - hbcu - drug 11 106_pdx_hci_models_hbcu
107 obesity - butyrate - diet - fto - obese 11 107_obesity_butyrate_diet_fto
108 hpk1 - hdm2006 - 306 - s109 - ubx 10 108_hpk1_hdm2006_306_s109
109 ldrt - metabolic - cd8 - lactylation - tcredcd39koher2 10 109_ldrt_metabolic_cd8_lactylation
110 hypoxia - hypoxic - hif1a - mhc1pp - ifn 10 110_hypoxia_hypoxic_hif1a_mhc1pp
111 cd47 - sirpa - smagp - avfc - imc 10 111_cd47_sirpa_smagp_avfc
112 nepc - prostate - pik3r1 - ceacam5 - ar 10 112_nepc_prostate_pik3r1_ceacam5
113 eif4e - translation - cap - eif4f - ovarian 9 113_eif4e_translation_cap_eif4f
114 ev - evs - mgm - plasma - biomarkers 9 114_ev_evs_mgm_plasma
115 ipro - prediction - performance - rpslearner - ct 9 115_ipro_prediction_performance_rpslearner
116 tf - xb371 - adce - uparap - coagulation 9 116_tf_xb371_adce_uparap
117 icis - ali - cish - anti - lag 9 117_icis_ali_cish_anti
118 nicotine - cigarette - memantine - bw813u - smoking 9 118_nicotine_cigarette_memantine_bw813u
119 nnmt - dnmt1 - stm9005 - mettl1 - rrm1 9 119_nnmt_dnmt1_stm9005_mettl1
120 eps - states - state - single - sub 9 120_eps_states_state_single
121 gastric - gc - tsrna - eo - cops5 9 121_gastric_gc_tsrna_eo
122 risk - women - bbd - breast - missing 9 122_risk_women_bbd_breast
123 h7 - bispecific - b7 - npx372 - tim 9 123_h7_bispecific_b7_npx372
124 nat - rectal - course - neoadjuvant - ild 9 124_nat_rectal_course_neoadjuvant
125 xpo1 - hsp90 - xpr1 - selinexor - slc34a2 9 125_xpo1_hsp90_xpr1_selinexor
126 p2x4 - pca - sqle - crisp3 - cxcr7 9 126_p2x4_pca_sqle_crisp3
127 ripk1 - lig1 - ctps2 - cisplatin - lig1het 8 127_ripk1_lig1_ctps2_cisplatin
128 age - dnam - risk - cpg - mage 8 128_age_dnam_risk_cpg
129 women - breast - lrig1 - duffy - bpe 8 129_women_breast_lrig1_duffy
130 nectin - ev - uc - glr1059 - iph4502 8 130_nectin_ev_uc_glr1059
131 ros1 - egfr - tkd - nsclc - zongertinib 8 131_ros1_egfr_tkd_nsclc
132 abd147 - clickable - binder - 225ac - capac 8 132_abd147_clickable_binder_225ac
133 34a - mir - endosomal - fm - nigericin 8 133_34a_mir_endosomal_fm
134 tcr - tcrs - prame - hla - supercharged 8 134_tcr_tcrs_prame_hla
135 spatial - l2 - immune - geomx - microenvironment 8 135_spatial_l2_immune_geomx
136 sedentary - physical - 93 - able - spent 8 136_sedentary_physical_93_able
137 fulvestrant - pts - bireociclib - endocrine - cdk4 8 137_fulvestrant_pts_bireociclib_endocrine
138 mal - trials - dose - oncology - cost 8 138_mal_trials_dose_oncology
139 adar1 - editing - p150 - rna - ribi 8 139_adar1_editing_p150_rna
140 adulthood - bmi - bri - alcohol - selenium 8 140_adulthood_bmi_bri_alcohol
141 ctdna - ddpcr - mutations - plasma - monitoring 8 141_ctdna_ddpcr_mutations_plasma
142 cadonilimab - bnt116 - safety - resectable - penpulimab 7 142_cadonilimab_bnt116_safety_resectable
143 irf4 - tbxt - persistence - resistant - drug 7 143_irf4_tbxt_persistence_resistant
144 btz - pi - proteasome - mm - ceritinib 7 144_btz_pi_proteasome_mm
145 nps - til - tgfb - brg399 - helios 7 145_nps_til_tgfb_brg399
146 lymphotoxin - hnscc - cd24 - il - ctla2a 7 146_lymphotoxin_hnscc_cd24_il
147 kif18a - cin - mitotic - yf550 - hw221043 7 147_kif18a_cin_mitotic_yf550
148 nad - nampt - nmn - ot - 82 7 148_nad_nampt_nmn_ot
149 arid1a - arid1b - swi - snf - eo3001 7 149_arid1a_arid1b_swi_snf
150 ptpn1 - all - bcp - nhd13 - splicing 7 150_ptpn1_all_bcp_nhd13
151 allo - asct - mm - hct - pem 7 151_allo_asct_mm_hct
152 flc - dnaj - pkac - fibrolamellar - surgery 7 152_flc_dnaj_pkac_fibrolamellar
153 telomerase - clpxp - telomere - g4 - clpx 7 153_telomerase_clpxp_telomere_g4
154 pc53k - tie2 - ku - yb - ovarian 6 154_pc53k_tie2_ku_yb
155 hydrogel - ecm - decm - matrix - kyse30 6 155_hydrogel_ecm_decm_matrix
156 msln - rc88 - zw171 - binding - 08052666 6 156_msln_rc88_zw171_binding
157 cachexia - muscle - edema - sma - adiposity 6 157_cachexia_muscle_edema_sma
158 emb - wx390 - dcr - mcrc - orr 6 158_emb_wx390_dcr_mcrc
159 neoantigens - frameshift - antigens - hla - as10 6 159_neoantigens_frameshift_antigens_hla
160 smip34 - rlip - atovaquone - eoc - cddp 6 160_smip34_rlip_atovaquone_eoc
161 rd3 - sided - colorectal - polyps - left 5 161_rd3_sided_colorectal_polyps
162 onc212 - onc206 - onc201 - atg101 - imipridones 5 162_onc212_onc206_onc201_atg101
163 hcc - gzmk - 37 - ph102 - foxp3high 5 163_hcc_gzmk_37_ph102
164 vitae - nec - nunc - id - sed 5 164_vitae_nec_nunc_id
165 emphysematous - ct - group - ca - recurrence 5 165_emphysematous_ct_group_ca
166 til - stim - reactive - feeder - obx 5 166_til_stim_reactive_feeder
167 fao - atp - kn510713 - cac - acaa1 5 167_fao_atp_kn510713_cac
168 3d - 2d - pathology - specimen - sections 5 168_3d_2d_pathology_specimen
169 radiation - flash - ray - fr - kvp 5 169_radiation_flash_ray_fr

Training hyperparameters

  • calculate_probabilities: False
  • language: None
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: True
  • zeroshot_min_similarity: 0.7
  • zeroshot_topic_list: None

Framework versions

  • Numpy: 1.26.4
  • HDBSCAN: 0.8.40
  • UMAP: 0.5.7
  • Pandas: 2.2.2
  • Scikit-Learn: 1.6.1
  • Sentence-transformers: 3.4.1
  • Transformers: 4.48.2
  • Numba: 0.61.0
  • Plotly: 5.24.1
  • Python: 3.11.11