DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models Paper • 2309.14509 • Published Sep 25, 2023 • 18
LLM Augmented LLMs: Expanding Capabilities through Composition Paper • 2401.02412 • Published Jan 4, 2024 • 38
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models Paper • 2401.06066 • Published Jan 11, 2024 • 53
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding Paper • 2401.12954 • Published Jan 23, 2024 • 31
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval Paper • 2401.18059 • Published Jan 31, 2024 • 42
Specialized Language Models with Cheap Inference from Limited Domain Data Paper • 2402.01093 • Published Feb 2, 2024 • 47
Repeat After Me: Transformers are Better than State Space Models at Copying Paper • 2402.01032 • Published Feb 1, 2024 • 24
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models Paper • 2402.01739 • Published Jan 29, 2024 • 27
Scaling Laws for Downstream Task Performance of Large Language Models Paper • 2402.04177 • Published Feb 6, 2024 • 18
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs Paper • 2402.04291 • Published Feb 6, 2024 • 50
Mixtures of Experts Unlock Parameter Scaling for Deep RL Paper • 2402.08609 • Published Feb 13, 2024 • 36
Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model Paper • 2402.07827 • Published Feb 12, 2024 • 47
Learning to Learn Faster from Human Feedback with Language Model Predictive Control Paper • 2402.11450 • Published Feb 18, 2024 • 22
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows Paper • 2402.10379 • Published Feb 16, 2024 • 31
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper • 2402.13753 • Published Feb 21, 2024 • 115
User-LLM: Efficient LLM Contextualization with User Embeddings Paper • 2402.13598 • Published Feb 21, 2024 • 20
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping Paper • 2402.14083 • Published Feb 21, 2024 • 48
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs Paper • 2402.15627 • Published Feb 23, 2024 • 38
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27, 2024 • 611
Stop Regressing: Training Value Functions via Classification for Scalable Deep RL Paper • 2403.03950 • Published Mar 6, 2024 • 16
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6, 2024 • 188
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM Paper • 2403.07816 • Published Mar 12, 2024 • 41
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU Paper • 2403.06504 • Published Mar 11, 2024 • 53
PERL: Parameter Efficient Reinforcement Learning from Human Feedback Paper • 2403.10704 • Published Mar 15, 2024 • 59
TnT-LLM: Text Mining at Scale with Large Language Models Paper • 2403.12173 • Published Mar 18, 2024 • 20
SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization Paper • 2405.11582 • Published May 19, 2024 • 17
Towards Modular LLMs by Building and Reusing a Library of LoRAs Paper • 2405.11157 • Published May 18, 2024 • 30
Block Transformer: Global-to-Local Language Modeling for Fast Inference Paper • 2406.02657 • Published Jun 4, 2024 • 40
Is It Really Long Context if All You Need Is Retrieval? Towards Genuinely Difficult Long Context NLP Paper • 2407.00402 • Published Jun 29, 2024 • 23
Agentless: Demystifying LLM-based Software Engineering Agents Paper • 2407.01489 • Published Jul 1, 2024 • 62
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models Paper • 2407.09025 • Published Jul 12, 2024 • 136
E5-V: Universal Embeddings with Multimodal Large Language Models Paper • 2407.12580 • Published Jul 17, 2024 • 40
Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning Paper • 2408.00690 • Published Aug 1, 2024 • 25
CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases Paper • 2408.03910 • Published Aug 7, 2024 • 18
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers Paper • 2408.06195 • Published Aug 12, 2024 • 71
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts Paper • 2409.16040 • Published Sep 24, 2024 • 14
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization Paper • 2410.08815 • Published Oct 11, 2024 • 49
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free Paper • 2410.10814 • Published Oct 14, 2024 • 51
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems Paper • 2411.02959 • Published Nov 5, 2024 • 69
Star Attention: Efficient LLM Inference over Long Sequences Paper • 2411.17116 • Published Nov 26, 2024 • 54
Personalized Graph-Based Retrieval for Large Language Models Paper • 2501.02157 • Published Jan 4 • 31
A Comprehensive Survey on Long Context Language Modeling Paper • 2503.17407 • Published 6 days ago • 43