Collections
Discover the best community collections!
Collections including paper arxiv:2404.02258
-
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 104 -
Textbooks Are All You Need
Paper • 2306.11644 • Published • 142 -
Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 108 -
Large Language Models Struggle to Learn Long-Tail Knowledge
Paper • 2211.08411 • Published • 3
-
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 104 -
Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs
Paper • 2403.20041 • Published • 35 -
ViTAR: Vision Transformer with Any Resolution
Paper • 2403.18361 • Published • 54
-
Communicative Agents for Software Development
Paper • 2307.07924 • Published • 5 -
Self-Refine: Iterative Refinement with Self-Feedback
Paper • 2303.17651 • Published • 2 -
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent
Paper • 2312.10003 • Published • 40 -
ReAct: Synergizing Reasoning and Acting in Language Models
Paper • 2210.03629 • Published • 24
-
The Unreasonable Ineffectiveness of the Deeper Layers
Paper • 2403.17887 • Published • 79 -
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 104 -
ReFT: Representation Finetuning for Language Models
Paper • 2404.03592 • Published • 94 -
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper • 2404.03715 • Published • 61