Value Residual Learning For Alleviating Attention Concentration In Transformers Paper • 2410.17897 • Published Oct 23, 2024 • 9
Flex Attention: A Programming Model for Generating Optimized Attention Kernels Paper • 2412.05496 • Published Dec 7, 2024 • 1
Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents Paper • 2504.00906 • Published 1 day ago • 17
ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning Paper • 2504.00254 • Published 2 days ago • 1
Representation & Optimization Collection Understanding about representation sheds light on optimization • 7 items • Updated 12 minutes ago • 1
Approximate Nullspace Augmented Finetuning for Robust Vision Transformers Paper • 2403.10476 • Published Mar 15, 2024 • 1
Layer by Layer: Uncovering Hidden Representations in Language Models Paper • 2502.02013 • Published Feb 4 • 1
CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners Paper • 2503.16356 • Published 14 days ago • 15
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders Paper • 2503.18878 • Published 10 days ago • 110
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published 28 days ago • 88
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation Paper • 2503.16430 • Published 14 days ago • 34
Image / Video Gen Collection Image Generation Using Diffusion-Based Methods: Tips and Techniques for Stable Diffusion • 36 items • Updated Mar 1 • 9
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution Paper • 2502.18449 • Published Feb 25 • 71
Slamming: Training a Speech Language Model on One GPU in a Day Paper • 2502.15814 • Published Feb 19 • 68
You Do Not Fully Utilize Transformer's Representation Capacity Paper • 2502.09245 • Published Feb 13 • 34
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity Paper • 2502.13063 • Published Feb 18 • 68