converting Phi
Hi,
With OpenNMT-py it's modular, like parallel_residual=True, Shared_LayerNorm=True, code base does not change.
So if I hack my current converters to convert Phi to OpenNMT-py weights, running in MoE mode should be straight forward since I already run Mixtral.
Hi, I'm not familiar with OpenNMT-py but this sounds great. Happy to see the results if you manage to run it!
for some reason the merge renamed the ff layers from fc1/fc2 to w1/w2 to take the mixtral naming convention.
I'll look further tomorrow but IMO it would be better to keep the Phi namings.
Oops yeah, I renamed them. Is fc1/fc2 => w1/w2 the only issue with the names? I can change it if it makes it easier for you.
ideally if you want to make it work with HF and slight changes in modeling_phi.py you may also rename block_sparse_moe => moe and experts => mlp
then we just need to add a class MoE(nn.Module) in modeling_phi.py
Cool! Can you confirm that the following is correct?
moe_tensor_name = tensor_name.replace("mlp.fc1.bias", f"moe.mlp.{moe_index}.fc1.bias")
moe_tensor_name = moe_tensor_name.replace("mlp.fc1.weight", f"moe.mlp.{moe_index}.fc1.weight")
moe_tensor_name = moe_tensor_name.replace("mlp.fc2.bias", f"moe.mlp.{moe_index}.fc2.bias")
moe_tensor_name = moe_tensor_name.replace("mlp.fc2.weight", f"moe.mlp.{moe_index}.fc2.weight")
I think I got it working. I patched model_phi.py with the wrong names so if you fix the tensors names I'll push it with the right names.
Hi @vince62s ,
I assume this is not working with huggingface weights of Phi2. Is it possible to support that?
Not sure what your question is but I made it work with HF, look at the model card.
So there are two implementations of phi2 one by Microsoft which requires trust_remote_code = True. There is another implementation which is actually in transformers official repo. The weights are available in this repo: susnato/phi-2
So I was wondering if it would possible to support this as well. I think its more or less copying MOE class and calling it in correct places with correct dimensions.
this would require HF to accept a PR on modeling_phi.py in the official transformers repo, which I don't think is possbile at the moment. so best is to use this repo for now.