Lora target

by Lingrui - opened 9 days ago

9 days ago

Why is the fine-tuning target the MLP experts in Layer 7, 15, and 23 (gate_up_proj, down_proj)? Is there some trick behind this?

justinj92

Owner 9 days ago

Due to the model being MoE, not all the parameters are modified and this not something I was experimenting on but was something recommended by OpenAI to target 7.mlp.experts.gate_up_proj, 15.mlp.experts.gate_up_proj, etc.

I have another run in the making that I am testing with not using these and trying what side effects its causes.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment