-
Hydragen: High-Throughput LLM Inference with Shared Prefixes
Paper • 2402.05099 • Published • 20 -
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting
Paper • 2402.13720 • Published • 7 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 29 -
Your Transformer is Secretly Linear
Paper • 2405.12250 • Published • 151
Collections
Discover the best community collections!
Collections including paper arxiv:2405.14860
-
PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models
Paper • 2402.08714 • Published • 12 -
Data Engineering for Scaling Language Models to 128K Context
Paper • 2402.10171 • Published • 24 -
RLVF: Learning from Verbal Feedback without Overgeneralization
Paper • 2402.10893 • Published • 11 -
Coercing LLMs to do and reveal (almost) anything
Paper • 2402.14020 • Published • 13
-
Partially Rewriting a Transformer in Natural Language
Paper • 2501.18838 • Published • 1 -
AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders
Paper • 2501.17148 • Published • 1 -
Sparse Autoencoders Trained on the Same Data Learn Different Features
Paper • 2501.16615 • Published • 1 -
Open Problems in Mechanistic Interpretability
Paper • 2501.16496 • Published • 16