ispstats.get_stats: too many values to unpack (expected 2)
After suceessful runing of the following steps:
import os
os.environ['TRANSFORMERS_CACHE'] = '/mnt/c/Users/pc/Downloads/cache/'
from geneformer import InSilicoPerturber
from geneformer import InSilicoPerturberStats
isp = InSilicoPerturber(perturb_type="overexpress",
perturb_rank_shift=None,
genes_to_perturb="all",
combos=0,
anchor_gene=None,
model_type="CellClassifier",
num_classes=29,
emb_mode="cell",
cell_emb_style="mean_pool",
cell_states_to_model={'state_key': 'cell_type', 'start_state': 'A', 'goal_state': 'B', 'alt_states': []},
max_ncells=52775,
emb_layer=0,
forward_batch_size=400,
nproc=44)
isp.perturb_data("/mnt/c/Users/pc/Downloads/GF/230816_geneformer_CellClassifier_celltypes_L2048_B8_LR0.00028292192255361916_LSlinear_WU559.5193527108581_E10_Oadamw_F2",
"/mnt/c/Users/pc/Downloads/transformer.dataset",
"/mnt/c/Users/pc/Downloads/GF2",
"full2")
#modify the file InSilicoPerturberStats in geneformer
#GENE_NAME_ID_DICTIONARY_FILE = "/mnt/c/Users/pc/Downloads/Geneformer/Geneformer/gene_name_id_dict.pkl"
ispstats = InSilicoPerturberStats(mode="goal_state_shift",
genes_perturbed="all",
combos=0,
anchor_gene=None,
cell_states_to_model={'state_key': 'cell_type', 'start_state': 'A', 'goal_state': 'B', 'alt_states': []})
Another error jumped out:
```
In [15]: ispstats.get_stats("/mnt/c/Users/pc/Downloads",
...: None,
...: "/mnt/c/Users/pc/Downloads/GF2",
...: "full")
0%| | 0/13 [00:00<?, ?it/s]
0%| | 0/15183 [00:00<?, ?it/s]
ValueError Traceback (most recent call last)
Cell In[15], line 1
----> 1 ispstats.get_stats("/mnt/c/Users/pc/Downloads",
2 None,
3 "/mnt/c/Users/pc/Downloads/GF2",
4 "full")
File /home/pc/miniconda3/envs/geneformer/lib/python3.10/site-packages/geneformer/in_silico_perturber_stats.py:696, in InSilicoPerturberStats.get_stats(self, input_data_directory, null_dist_data_directory, output_directory, output_prefix)
684 cos_sims_df_initial = pd.DataFrame({"Gene": gene_list,
685 "Gene_name": [self.token_to_gene_name(item)
686 for item in gene_list],
(...)
692 for genes in gene_list]},
693 index=[i for i in range(len(gene_list))])
695 if self.mode == "goal_state_shift":
--> 696 cos_sims_df = isp_stats_to_goal_state(cos_sims_df_initial, dict_list, self.cell_states_to_model, self.genes_perturbed)
698 elif self.mode == "vs_null":
699 null_dict_list = read_dictionaries(null_dist_data_directory, "cell", self.anchor_token)
File /home/pc/miniconda3/envs/geneformer/lib/python3.10/site-packages/geneformer/in_silico_perturber_stats.py:170, in isp_stats_to_goal_state(cos_sims_df, dict_list, cell_states_to_model, genes_perturbed)
167 random_tuples += dict_i.get((token, "cell_emb"),[])
169 if alt_end_state_exists == False:
--> 170 goal_end_random_megalist = [goal_end for start_state,goal_end in random_tuples]
171 elif alt_end_state_exists == True:
172 goal_end_random_megalist = [goal_end for start_state,goal_end,alt_end in random_tuples]
File /home/pc/miniconda3/envs/geneformer/lib/python3.10/site-packages/geneformer/in_silico_perturber_stats.py:170, in (.0)
167 random_tuples += dict_i.get((token, "cell_emb"),[])
169 if alt_end_state_exists == False:
--> 170 goal_end_random_megalist = [goal_end for start_state,goal_end in random_tuples]
171 elif alt_end_state_exists == True:
172 goal_end_random_megalist = [goal_end for start_state,goal_end,alt_end in random_tuples]
ValueError: too many values to unpack (expected 2)
Was it due to the issues with cell_class assignment? I tried to remove the 'alt_states' and not working. Thank you.
I found it was caused by the misplacement of the 3 _raw.pickle
files in /mnt/c/Users/pc/Downloads/
when I set the output directory as '/mnt/c/Users/pc/Downloads/GF2' at isp.perturb_data
. The script started running after moving the 3 files to the correct folder and trimming the first 3 characters 'GF2'.