Submitted by Jingfeng Yao 66 Towards Scalable Pre-training of Visual Tokenizers for Generation MiniMax 92 1