TinySwallow Collection Compact Japanese models trained with "TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models" β’ 5 items β’ Updated 1 day ago β’ 9
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models Paper β’ 2501.16937 β’ Published 3 days ago β’ 4
view post Post 2551 Itβs not just a flood of model releases, papers are dropping just as fast πHere are the 10 most upvoted papers from the Chinese community:π zh-ai-community/2025-january-papers-679933cbf0f3ced11f5a168a See translation π₯ 5 5 π 3 3 + Reply
ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language Model Born from Transformer Paper β’ 2501.15570 β’ Published 5 days ago β’ 17