view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency By not-lain • 1 day ago • 19
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models Paper • 2501.16937 • Published 3 days ago • 4
OpenCharacter: Training Customizable Role-Playing LLMs with Large-Scale Synthetic Personas Paper • 2501.15427 • Published 5 days ago • 4
DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation Paper • 2501.16764 • Published 3 days ago • 14
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published 2 days ago • 48
Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation Paper • 2501.15907 • Published 4 days ago • 14