Lora target
#2
by
Lingrui
- opened
Why is the fine-tuning target the MLP experts in Layer 7, 15, and 23 (gate_up_proj, down_proj)? Is there some trick behind this?
Due to the model being MoE, not all the parameters are modified and this not something I was experimenting on but was something recommended by OpenAI to target 7.mlp.experts.gate_up_proj, 15.mlp.experts.gate_up_proj, etc.
I have another run in the making that I am testing with not using these and trying what side effects its causes.