Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed Paper • 2512.14067 • Published 18 days ago • 13
Budget-Aware Tool-Use Enables Effective Agent Scaling Paper • 2511.17006 • Published Nov 21, 2025 • 29
Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models Paper • 2511.08577 • Published Nov 11, 2025 • 105
Adaptive Multi-Agent Response Refinement in Conversational Systems Paper • 2511.08319 • Published Nov 11, 2025 • 41
LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs Paper • 2511.06174 • Published Nov 9, 2025 • 6
When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs Paper • 2510.07499 • Published Oct 8, 2025 • 48
Cache-to-Cache: Direct Semantic Communication Between Large Language Models Paper • 2510.03215 • Published Oct 3, 2025 • 97
EpiCache: Episodic KV Cache Management for Long Conversational Question Answering Paper • 2509.17396 • Published Sep 22, 2025 • 19
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning Paper • 2504.17192 • Published Apr 24, 2025 • 120
Autellix: An Efficient Serving Engine for LLM Agents as General Programs Paper • 2502.13965 • Published Feb 19, 2025 • 19
Lossless Acceleration of Large Language Models with Hierarchical Drafting based on Temporal Locality in Speculative Decoding Paper • 2502.05609 • Published Feb 8, 2025 • 18
Typos that Broke the RAG's Back: Genetic Attack on RAG Pipeline by Simulating Documents in the Wild via Low-level Perturbations Paper • 2404.13948 • Published Apr 22, 2024 • 2
VideoRAG: Retrieval-Augmented Generation over Video Corpus Paper • 2501.05874 • Published Jan 10, 2025 • 75
Revisiting In-Context Learning with Long Context Language Models Paper • 2412.16926 • Published Dec 22, 2024 • 32
Knowledge-Augmented Large Language Models for Personalized Contextual Query Suggestion Paper • 2311.06318 • Published Nov 10, 2023 • 3