-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 127 -
Evolutionary Optimization of Model Merging Recipes
Paper • 2403.13187 • Published • 53 -
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Paper • 2402.03766 • Published • 14 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 68
Collections
Discover the best community collections!
Collections including paper arxiv:2404.07143
-
Sequence Parallelism: Long Sequence Training from System Perspective
Paper • 2105.13120 • Published • 5 -
Ring Attention with Blockwise Transformers for Near-Infinite Context
Paper • 2310.01889 • Published • 11 -
Striped Attention: Faster Ring Attention for Causal Transformers
Paper • 2311.09431 • Published • 4 -
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Paper • 2309.14509 • Published • 18
-
Re3: Generating Longer Stories With Recursive Reprompting and Revision
Paper • 2210.06774 • Published • 2 -
Constitutional AI: Harmlessness from AI Feedback
Paper • 2212.08073 • Published • 2 -
AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls
Paper • 2402.04253 • Published -
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate
Paper • 2305.19118 • Published
-
Training-Free Long-Context Scaling of Large Language Models
Paper • 2402.17463 • Published • 23 -
Evaluating Very Long-Term Conversational Memory of LLM Agents
Paper • 2402.17753 • Published • 20 -
Resonance RoPE: Improving Context Length Generalization of Large Language Models
Paper • 2403.00071 • Published • 24 -
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences
Paper • 2403.09347 • Published • 22
-
Beyond Language Models: Byte Models are Digital World Simulators
Paper • 2402.19155 • Published • 53 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 56 -
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper • 2403.00522 • Published • 46 -
Resonance RoPE: Improving Context Length Generalization of Large Language Models
Paper • 2403.00071 • Published • 24
-
Nemotron-4 15B Technical Report
Paper • 2402.16819 • Published • 46 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 56 -
RWKV: Reinventing RNNs for the Transformer Era
Paper • 2305.13048 • Published • 18 -
Reformer: The Efficient Transformer
Paper • 2001.04451 • Published
-
In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss
Paper • 2402.10790 • Published • 42 -
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration
Paper • 2402.11550 • Published • 18 -
A Neural Conversational Model
Paper • 1506.05869 • Published • 2 -
Data Engineering for Scaling Language Models to 128K Context
Paper • 2402.10171 • Published • 25
-
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Paper • 2402.10644 • Published • 81 -
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Paper • 2305.13245 • Published • 5 -
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition
Paper • 2402.15220 • Published • 21 -
Sequence Parallelism: Long Sequence Training from System Perspective
Paper • 2105.13120 • Published • 5
-
Neural Network Diffusion
Paper • 2402.13144 • Published • 96 -
Genie: Generative Interactive Environments
Paper • 2402.15391 • Published • 71 -
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Paper • 2402.17177 • Published • 88 -
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper • 2403.00522 • Published • 46