Regarding the vocabulary used in the paper

#11
by jiaxin-wen - opened

Thanks for your great work!
I have a question regarding the vocabulary. Specifically, the paper mentions that "We use GPT-Neo tokenizer but only keep the top 10K most common tokens". However, the current uploaded vocabulary consists of 50K tokens. Would you please update the vocabulary that can be used to reproduce your experiments:)

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment