Running 2.29k 2.29k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
Scaling Properties of Diffusion Models for Perceptual Tasks Paper • 2411.08034 • Published Nov 12, 2024 • 13
Cosmos Tokenizer Collection A suite of image and video tokenizers • 13 items • Updated 19 minutes ago • 39
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second Paper • 2410.02073 • Published Oct 2, 2024 • 41
Running 15 15 timm Attention Visualization 👁 Visualize attention maps for images using selected models
view article Article Welcome FalconMamba: The first strong attention-free 7B model Aug 12, 2024 • 109
MobileNetV4 pretrained weights Collection Weights for MobileNet-V4 pretrained in timm • 17 items • Updated Sep 22, 2024 • 18
DiTFastAttn: Attention Compression for Diffusion Transformer Models Paper • 2406.08552 • Published Jun 12, 2024 • 25
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations Paper • 2405.18392 • Published May 28, 2024 • 12
The Unreasonable Ineffectiveness of the Deeper Layers Paper • 2403.17887 • Published Mar 26, 2024 • 79