view article Article Ο0 and Ο0-FAST: Vision-Language-Action Models for General Robot Control Feb 4 β’ 114
view article Article LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning? Jul 25, 2024 β’ 18
Searching for Better ViT Baselines Collection Exploring ViT hparams and model shapes for the GPU poor (between tiny and base). β’ 28 items β’ Updated Feb 14 β’ 17
MobileNetV4 pretrained weights Collection Weights for MobileNet-V4 pretrained in timm β’ 17 items β’ Updated Sep 22, 2024 β’ 18
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens Paper β’ 2406.11271 β’ Published Jun 17, 2024 β’ 21
What If We Recaption Billions of Web Images with LLaMA-3? Paper β’ 2406.08478 β’ Published Jun 12, 2024 β’ 40
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations Paper β’ 2405.18392 β’ Published May 28, 2024 β’ 12
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models Paper β’ 2405.15738 β’ Published May 24, 2024 β’ 46
PaliGemma Release Collection Pretrained and mix checkpoints for PaliGemma β’ 16 items β’ Updated 6 days ago β’ 145