mlabonne/phixtral-4x2_8 · converting Phi

converting Phi

by vince62s - opened Jan 8, 2024

Jan 8, 2024

Hi,
With OpenNMT-py it's modular, like parallel_residual=True, Shared_LayerNorm=True, code base does not change.
So if I hack my current converters to convert Phi to OpenNMT-py weights, running in MoE mode should be straight forward since I already run Mixtral.

mlabonne

Owner Jan 8, 2024

Hi, I'm not familiar with OpenNMT-py but this sounds great. Happy to see the results if you manage to run it!

vince62s

Jan 8, 2024

for some reason the merge renamed the ff layers from fc1/fc2 to w1/w2 to take the mixtral naming convention.
I'll look further tomorrow but IMO it would be better to keep the Phi namings.

mlabonne

Owner Jan 8, 2024

Oops yeah, I renamed them. Is fc1/fc2 => w1/w2 the only issue with the names? I can change it if it makes it easier for you.

vince62s

Jan 8, 2024

ideally if you want to make it work with HF and slight changes in modeling_phi.py you may also rename block_sparse_moe => moe and experts => mlp
then we just need to add a class MoE(nn.Module) in modeling_phi.py

mlabonne

Owner Jan 8, 2024

Cool! Can you confirm that the following is correct?

moe_tensor_name = tensor_name.replace("mlp.fc1.bias", f"moe.mlp.{moe_index}.fc1.bias")
moe_tensor_name = moe_tensor_name.replace("mlp.fc1.weight", f"moe.mlp.{moe_index}.fc1.weight")
moe_tensor_name = moe_tensor_name.replace("mlp.fc2.bias", f"moe.mlp.{moe_index}.fc2.bias")
moe_tensor_name = moe_tensor_name.replace("mlp.fc2.weight", f"moe.mlp.{moe_index}.fc2.weight")

vince62s

Jan 9, 2024

I think I got it working. I patched model_phi.py with the wrong names so if you fix the tensors names I'll push it with the right names.

agokrani

Jan 13, 2024

•

edited Jan 13, 2024

Hi @vince62s ,

I assume this is not working with huggingface weights of Phi2. Is it possible to support that?

vince62s

Jan 13, 2024

Not sure what your question is but I made it work with HF, look at the model card.

agokrani

Jan 13, 2024

So there are two implementations of phi2 one by Microsoft which requires trust_remote_code = True. There is another implementation which is actually in transformers official repo. The weights are available in this repo: susnato/phi-2

So I was wondering if it would possible to support this as well. I think its more or less copying MOE class and calling it in correct places with correct dimensions.

vince62s

Jan 13, 2024

this would require HF to accept a PR on modeling_phi.py in the official transformers repo, which I don't think is possbile at the moment. so best is to use this repo for now.

vince62s changed discussion status to closed Jan 15, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment