-
Measuring the Effects of Data Parallelism on Neural Network Training
Paper • 1811.03600 • Published • 2 -
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Paper • 1804.04235 • Published • 2 -
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Paper • 1905.11946 • Published • 3 -
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 63
Collections
Discover the best community collections!
Collections including paper arxiv:2403.16971
-
Evaluating Very Long-Term Conversational Memory of LLM Agents
Paper • 2402.17753 • Published • 20 -
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding
Paper • 2402.16671 • Published • 29 -
Do Large Language Models Latently Perform Multi-Hop Reasoning?
Paper • 2402.16837 • Published • 27 -
Divide-or-Conquer? Which Part Should You Distill Your LLM?
Paper • 2402.15000 • Published • 23
-
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 55 -
Beyond Language Models: Byte Models are Digital World Simulators
Paper • 2402.19155 • Published • 51 -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 138 -
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 20
-
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
Paper • 2402.14797 • Published • 20 -
Subobject-level Image Tokenization
Paper • 2402.14327 • Published • 18 -
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Paper • 2402.14905 • Published • 128 -
GPTVQ: The Blessing of Dimensionality for LLM Quantization
Paper • 2402.15319 • Published • 21
-
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 83 -
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper • 2402.01739 • Published • 27 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 66 -
Poro 34B and the Blessing of Multilinguality
Paper • 2404.01856 • Published • 15
-
WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models
Paper • 2401.13919 • Published • 30 -
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Paper • 2401.14405 • Published • 13 -
Design2Code: How Far Are We From Automating Front-End Engineering?
Paper • 2403.03163 • Published • 95 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 66
-
BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language Models
Paper • 2401.12522 • Published • 12 -
Hydragen: High-Throughput LLM Inference with Shared Prefixes
Paper • 2402.05099 • Published • 20 -
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Paper • 2402.04291 • Published • 49 -
Shortened LLaMA: A Simple Depth Pruning for Large Language Models
Paper • 2402.02834 • Published • 16
-
TinyLlama: An Open-Source Small Language Model
Paper • 2401.02385 • Published • 93 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 47 -
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper • 2401.15024 • Published • 72 -
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
Paper • 2401.16380 • Published • 49
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 180 -
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
Paper • 2401.04658 • Published • 27 -
Weaver: Foundation Models for Creative Writing
Paper • 2401.17268 • Published • 44 -
Efficient Tool Use with Chain-of-Abstraction Reasoning
Paper • 2401.17464 • Published • 20