why not include Qwen1.5-MoE-A2.7B in the table?

#4
by J22 - opened

IMHO, Qwen1.5-MoE-A2.7B is SOTA MOE model with 2B active parameters.

Before comparing, it would be good to know how many tokens the model is trained on and what data they used (including the original dense model before upcycling). Furthermore, it should be considered as a concurrent work.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment