fix-grid-limits

#2
by 3outeille HF Staff - opened
kernels-community org
edited 6 days ago

for people using megablocks for training and thus having seqlen=4096 . This will yield a Triton Error [CUDA]: invalid argument at _binned_copy[(num_experts, expert_capacity)] as expert_capacity needs to be < 65535 (as per cuda doc) . Reason for expert_capacity to be that large is that large is because of tokens_per_expert = top_k * tokens * world_size / num_experts. We can't change value of top_k and num_experts as most models has been trained with those specific set of values. One simple fix is to swap the dims of the kernels as 1st dim has a hard limit of 2^31-1. Plus num_experts rarely goes to that number anyway

3outeille changed pull request status to open
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment