ispstats.get_stats: too many values to unpack (expected 2)

#226
by pchiang5 - opened

After suceessful runing of the following steps:

import os
os.environ['TRANSFORMERS_CACHE'] = '/mnt/c/Users/pc/Downloads/cache/'
from geneformer import InSilicoPerturber
from geneformer import InSilicoPerturberStats
isp = InSilicoPerturber(perturb_type="overexpress",
                        perturb_rank_shift=None,
                        genes_to_perturb="all",
                        combos=0,
                        anchor_gene=None,
                        model_type="CellClassifier",
                        num_classes=29,
                        emb_mode="cell",
                        cell_emb_style="mean_pool",
                        cell_states_to_model={'state_key': 'cell_type', 'start_state': 'A', 'goal_state': 'B', 'alt_states': []},
                        max_ncells=52775, 
                        emb_layer=0,
                        forward_batch_size=400,
                        nproc=44)
isp.perturb_data("/mnt/c/Users/pc/Downloads/GF/230816_geneformer_CellClassifier_celltypes_L2048_B8_LR0.00028292192255361916_LSlinear_WU559.5193527108581_E10_Oadamw_F2",
                 "/mnt/c/Users/pc/Downloads/transformer.dataset",
                 "/mnt/c/Users/pc/Downloads/GF2",
                 "full2")
#modify the file InSilicoPerturberStats in geneformer
#GENE_NAME_ID_DICTIONARY_FILE = "/mnt/c/Users/pc/Downloads/Geneformer/Geneformer/gene_name_id_dict.pkl"
ispstats = InSilicoPerturberStats(mode="goal_state_shift",
                                  genes_perturbed="all",
                                  combos=0,
                                  anchor_gene=None,
                                  cell_states_to_model={'state_key': 'cell_type', 'start_state': 'A', 'goal_state': 'B', 'alt_states': []})

Another error jumped out:
```
In [15]: ispstats.get_stats("/mnt/c/Users/pc/Downloads",
...: None,
...: "/mnt/c/Users/pc/Downloads/GF2",
...: "full")
0%| | 0/13 [00:00<?, ?it/s]
0%| | 0/15183 [00:00<?, ?it/s]

ValueError Traceback (most recent call last)
Cell In[15], line 1
----> 1 ispstats.get_stats("/mnt/c/Users/pc/Downloads",
2 None,
3 "/mnt/c/Users/pc/Downloads/GF2",
4 "full")

File /home/pc/miniconda3/envs/geneformer/lib/python3.10/site-packages/geneformer/in_silico_perturber_stats.py:696, in InSilicoPerturberStats.get_stats(self, input_data_directory, null_dist_data_directory, output_directory, output_prefix)
684 cos_sims_df_initial = pd.DataFrame({"Gene": gene_list,
685 "Gene_name": [self.token_to_gene_name(item)
686 for item in gene_list],
(...)
692 for genes in gene_list]},
693 index=[i for i in range(len(gene_list))])
695 if self.mode == "goal_state_shift":
--> 696 cos_sims_df = isp_stats_to_goal_state(cos_sims_df_initial, dict_list, self.cell_states_to_model, self.genes_perturbed)
698 elif self.mode == "vs_null":
699 null_dict_list = read_dictionaries(null_dist_data_directory, "cell", self.anchor_token)

File /home/pc/miniconda3/envs/geneformer/lib/python3.10/site-packages/geneformer/in_silico_perturber_stats.py:170, in isp_stats_to_goal_state(cos_sims_df, dict_list, cell_states_to_model, genes_perturbed)
167 random_tuples += dict_i.get((token, "cell_emb"),[])
169 if alt_end_state_exists == False:
--> 170 goal_end_random_megalist = [goal_end for start_state,goal_end in random_tuples]
171 elif alt_end_state_exists == True:
172 goal_end_random_megalist = [goal_end for start_state,goal_end,alt_end in random_tuples]

File /home/pc/miniconda3/envs/geneformer/lib/python3.10/site-packages/geneformer/in_silico_perturber_stats.py:170, in (.0)
167 random_tuples += dict_i.get((token, "cell_emb"),[])
169 if alt_end_state_exists == False:
--> 170 goal_end_random_megalist = [goal_end for start_state,goal_end in random_tuples]
171 elif alt_end_state_exists == True:
172 goal_end_random_megalist = [goal_end for start_state,goal_end,alt_end in random_tuples]

ValueError: too many values to unpack (expected 2)


Was it due to the issues with cell_class assignment? I tried to remove the 'alt_states' and not working.   Thank you.

I found it was caused by the misplacement of the 3 _raw.pickle files in /mnt/c/Users/pc/Downloads/ when I set the output directory as '/mnt/c/Users/pc/Downloads/GF2' at isp.perturb_data. The script started running after moving the 3 files to the correct folder and trimming the first 3 characters 'GF2'.

pchiang5 changed discussion status to closed

Sign up or log in to comment