QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models Paper • 2309.14717 • Published Sep 26, 2023 • 44
Reward-Guided Speculative Decoding for Efficient LLM Reasoning Paper • 2501.19324 • Published 12 days ago • 34
Reward-Guided Speculative Decoding for Efficient LLM Reasoning Paper • 2501.19324 • Published 12 days ago • 34
Reward-Guided Speculative Decoding for Efficient LLM Reasoning Paper • 2501.19324 • Published 12 days ago • 34 • 4
view article Article Mastering Long Contexts in LLMs with KVPress By nvidia and 1 other • 20 days ago • 62
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code Paper • 2410.08196 • Published Oct 10, 2024 • 46
MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs Paper • 2410.04698 • Published Oct 7, 2024 • 13
MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs Paper • 2410.04698 • Published Oct 7, 2024 • 13
PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search Paper • 1907.05737 • Published Jul 12, 2019
Trained Rank Pruning for Efficient Deep Neural Networks Paper • 1812.02402 • Published Dec 6, 2018 • 1