-
Attention Is All You Need
Paper • 1706.03762 • Published • 55 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 17 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 7 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 14
Collections
Discover the best community collections!
Collections including paper arxiv:2401.17268
-
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper • 2412.11768 • Published • 41 -
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
Paper • 2412.14161 • Published • 51 -
HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models in Resource-Constrained Environments
Paper • 2408.10945 • Published • 11 -
PDFTriage: Question Answering over Long, Structured Documents
Paper • 2309.08872 • Published • 54
-
Just How Flexible are Neural Networks in Practice?
Paper • 2406.11463 • Published • 7 -
Not All Language Model Features Are Linear
Paper • 2405.14860 • Published • 41 -
KAN: Kolmogorov-Arnold Networks
Paper • 2404.19756 • Published • 111 -
An Interactive Agent Foundation Model
Paper • 2402.05929 • Published • 28
-
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs
Paper • 2401.11708 • Published • 30 -
Weaver: Foundation Models for Creative Writing
Paper • 2401.17268 • Published • 44 -
PokéLLMon: A Human-Parity Agent for Pokémon Battles with Large Language Models
Paper • 2402.01118 • Published • 31 -
Training-Free Consistent Text-to-Image Generation
Paper • 2402.03286 • Published • 67
-
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning
Paper • 2402.00769 • Published • 22 -
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module
Paper • 2311.05556 • Published • 85 -
LongAlign: A Recipe for Long Context Alignment of Large Language Models
Paper • 2401.18058 • Published • 21 -
Efficient Tool Use with Chain-of-Abstraction Reasoning
Paper • 2401.17464 • Published • 20