vandijklab/brainlm · Why are the architectures of the `vitmae_111M` and `vitmae

Dear authors,

I’ve been trying to use the pre-trained weights from BrainLM recently, and I have some questions during this process.

For model old_13M/pytorch_model.bin, its architecture is the same as the architecture reported in the BrainLM paper. I checked the architecture using the following code:

import torch
state_dict = torch.load("old_13M/pytorch_model.bin", map_location='cpu')
print(state_dict.keys())

"""
Output:
dict_keys([..., 'vit.embeddings.signal_embedding_projection.weight', ..., 'vit.embeddings.xyz_embedding_projection.weight',  ...])

torch.Size([512, 20])
"""

print(state_dict['vit.embeddings.signal_embedding_projection.weight'])
"""
Output:
torch.Size([512, 20])
"""

The above output is expected: the state_dict contains the key signal_embedding_projection, which projects 20 timepoints onto a higher-dimensional space, and the key 'xyz_embedding_projection', which also tries to project a 3d vector onto a higher-dimensional space.

However, for model vitmae_111M/pytorch_model.bin, the situation is different:

state_dict = torch.load("vitmae_111M/pytorch_model.bin", map_location='cpu')
print(state_dict.keys())

"""
Output:
dict_keys([...,  'vit.embeddings.patch_embeddings.projection.weight', 'vit.embeddings.patch_embeddings.projection.bias', ...])
"""

First, all keys in the state_dict do not match those of old_13M's architecture, which is reported in the paper. For example, neither xyz_embedding_projection, nor signal_embedding_projection are in the state_dict.

I also checked the shape of vit.embeddings.patch_embeddings.projection.weight, and it turned out to be torch.Size([768, 3, 16, 16]). This is unexpected, because a patch in BrainLM should have 20 timepoints, not (16, 16) pixels with 3 channels.

The input shape for a subject should be (number of ROIs, timepoints) --> (424, 200), without a channel dimension.

Could you please clarify why the architectures of these models differ in this way? I would greatly appreciate your insight into this matter.

Thank you very much for your time and assistance.

vandijklab
/

brainlm

Why are the architectures of the `vitmae_111M` and `vitmae_650M` models different from BrainLM?