Lora target

#2
by Lingrui - opened

Why is the fine-tuning target the MLP experts in Layer 7, 15, and 23 (gate_up_proj, down_proj)? Is there some trick behind this?

Due to the model being MoE, not all the parameters are modified and this not something I was experimenting on but was something recommended by OpenAI to target 7.mlp.experts.gate_up_proj, 15.mlp.experts.gate_up_proj, etc.

I have another run in the making that I am testing with not using these and trying what side effects its causes.

Sign up or log in to comment