Optimizing Large Language Model Training Using FP4 Quantization Paper • 2501.17116 • Published 2 days ago • 23
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published 2 days ago • 48
ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language Model Born from Transformer Paper • 2501.15570 • Published 5 days ago • 17