-
One-step Diffusion with Distribution Matching Distillation
Paper • 2311.18828 • Published • 3 -
The Unreasonable Ineffectiveness of the Deeper Layers
Paper • 2403.17887 • Published • 79 -
Condition-Aware Neural Network for Controlled Image Generation
Paper • 2404.01143 • Published • 13 -
Locating and Editing Factual Associations in GPT
Paper • 2202.05262 • Published • 1
Collections
Discover the best community collections!
Collections including paper arxiv:2404.14394
-
Prompt me a Dataset: An investigation of text-image prompting for historical image dataset creation using foundation models
Paper • 2309.01674 • Published • 2 -
Segment Anything
Paper • 2304.02643 • Published • 4 -
EgoLifter: Open-world 3D Segmentation for Egocentric Perception
Paper • 2403.18118 • Published • 12 -
A Multimodal Automated Interpretability Agent
Paper • 2404.14394 • Published • 21
-
Demystifying CLIP Data
Paper • 2309.16671 • Published • 20 -
Model Stock: All we need is just a few fine-tuned models
Paper • 2403.19522 • Published • 12 -
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models
Paper • 2404.01367 • Published • 22 -
On the Scalability of Diffusion-based Text-to-Image Generation
Paper • 2404.02883 • Published • 19
-
Self-Supervised Vision Transformers Learn Visual Concepts in Histopathology
Paper • 2203.00585 • Published • 2 -
Emerging Properties in Self-Supervised Vision Transformers
Paper • 2104.14294 • Published • 3 -
DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting
Paper • 2404.06903 • Published • 19 -
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Paper • 2404.07973 • Published • 32
-
Wide Residual Networks
Paper • 1605.07146 • Published • 2 -
Characterizing signal propagation to close the performance gap in unnormalized ResNets
Paper • 2101.08692 • Published • 2 -
Pareto-Optimal Quantized ResNet Is Mostly 4-bit
Paper • 2105.03536 • Published • 2 -
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations
Paper • 2106.01548 • Published • 2
-
LLM Augmented LLMs: Expanding Capabilities through Composition
Paper • 2401.02412 • Published • 37 -
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration
Paper • 2402.11550 • Published • 18 -
A Multimodal Automated Interpretability Agent
Paper • 2404.14394 • Published • 21
-
DocGraphLM: Documental Graph Language Model for Information Extraction
Paper • 2401.02823 • Published • 36 -
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper • 2401.02038 • Published • 64 -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 180 -
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration
Paper • 2309.01131 • Published • 1