Cleaning FOdb Supplementary Table 5 Isolated the 25 low-MI features used to train ML model 1. ABTbalance 2. ABTdensity 3. ABTvalence 4. DisLength 5. GeneralSeqDec_Best 6. Mean_Hydropathy_FB_HSP 7. Mean_Hydropathy_KD 8. NLLR 9. Number_NLS_Predicted 10. Omega 11. Overall_NCPR 12. PAPAprop 13. PPII_propensity 14. PScoreValue 15. PositiveResidues 16. SeqChargeDec 17. SeqHydDec_Best 18. delta 19. fraction_aromatic 20. fraction_disorder_promoting 21. fraction_negative 22. fraction_polar 23. fraction_positive 24. fraction_proline 25. kappa Cleaning FOdb Supplementary Table 4 Removed invalid FOs (puncta status = "Other" or "Nucleolar"). Remaining FOs: 178 Total duplicated sequences: 0 Checking for invalid characters... Found 1 invalid characters Invalid char - at index 706/706 of sequence MKRAHPEYSSSDSELDETIEVEKESADENGNLSSALGSMSPTTSSQILARKRRRGIIEKRRRDRINNSLSELRRLVPSAFEKQGSAKLEKAEILQMTVDHLKMLHTAGGKAFNNPRPGQLGRLLPNQNLPLDITLQSPTGAGPFPPIRNSSPYSVIPQPGMMGNQGMIGNQGNLGNSSTGMIGNSASRPTMPSGEWAPQSSAVRVTCAATTSAMNRPVQGGMIRNPAASIPMRPSSQPGQRQTLQSQVMNIGPSELEMNMGGPQYSQQQAPPNQTAPWPESILPIDQASFASQNRQPFGSSPDDLLCPHPAAESPSDEGALLDQLYLALRNFDGLEEIDRALGIPELVSQSQAVDPEQFSSQDSNIMLEQKAPVFPQQYASQAQMAQGSYSPMQDPNFHTMGQRPSYATLRMQPRPGLRPTGLVQNQPNQLRLQLQHRLQAQQNRQPLMNQISNVSNVNLTLRPGVPTQAPINAQMLAQRQREILNQHLRQRQMHQQQQVQQRTLMMRGQGLNMTPSMVAPSGIPATMSNPRIPQANAQQFPFPPNYGISQQPDPGFTGATTPQSPLMSPRMAHTQSPMMQQSQANPAYQAPSDINGWAQGNMGGNSMFSQQSPPHFGQQANTSMYSNNMNINVSMATNTGGMSSMNQMTGQISMTSVTSVPTSGLSSMGPEQVNDPALRGGNLFPNQLPGMDMIKQEGDTTRKYC- Changed FO names to Head::Tail format Checking for the 25 low-MI features... 25 found Feature ABTbalance has 12 np.nan values in the following datasets: Verification_Set: 12 Feature ABTdensity has 12 np.nan values in the following datasets: Verification_Set: 12 Feature ABTvalence has 12 np.nan values in the following datasets: Verification_Set: 12 Feature PAPAprop has 1 np.nan values in the following datasets: Verification_Set: 1 Feature PScoreValue has 1 np.nan values in the following datasets: Verification_Set: 1 Puncta localization for 115 FOs where Puncta_Status==YES Nucleus: 52 (45.22%) Cytoplasm: 43 (37.39%) Both: 20 (17.39%) Dataset breakdown... Expressed_Set: 149 (83.71%) YES: 96 (64.43%) Localizations... Nucleus: 48 (50.00%) Cytoplasm: 32 (33.33%) Both: 16 (16.67%) NO: 53 (35.57%) Verification_Set: 29 (16.29%) YES: 19 (65.52%) Localizations... Cytoplasm: 11 (57.89%) Nucleus: 4 (21.05%) Both: 4 (21.05%) NO: 10 (34.48%) Making phyisochemical feature vectors. Feature Order: 0. ABTbalance 1. ABTdensity 2. ABTvalence 3. DisLength 4. GeneralSeqDec_Best 5. Mean_Hydropathy_FB_HSP 6. Mean_Hydropathy_KD 7. NLLR 8. Number_NLS_Predicted 9. Omega 10. Overall_NCPR 11. PAPAprop 12. PPII_propensity 13. PScoreValue 14. PositiveResidues 15. SeqChargeDec 16. SeqHydDec_Best 17. delta 18. fraction_aromatic 19. fraction_disorder_promoting 20. fraction_negative 21. fraction_polar 22. fraction_positive 23. fraction_proline 24. kappa Saved cleaned table S5 to cleaned_dataset_s4.csv Saved train-test splits with nucleus, cytoplasm, and formation labels to splits.csv Saved physicochemical embeddings as a dictionary to FOdb_physicochemical_embeddings.pkl