Perception Encoder Collection OpenCLIP (PE Core image + text) and timm PE Core, Spatial, Lang (ViT only) weights. NOTE: These weights do not work with original modeling code. • 19 items • Updated Aug 1 • 5
view article Article nanoVLM: The simplest repository to train your VLM in pure PyTorch By ariG23498 and 6 others • May 21 • 208
view article Article A Dive into Pretraining Strategies for Vision-Language Models By adirik and 1 other • Feb 3, 2023 • 74