L^2M: Mutual Information Scaling Law for Long-Context Language Modeling Paper • 2503.04725 • Published 6 days ago • 19
On the Acquisition of Shared Grammatical Representations in Bilingual Language Models Paper • 2503.03962 • Published 6 days ago • 3