refactor(muon): change argument adam_wd to weight_decay and handle params' type 02ac540 iamwyldecat commited on Jun 23
fix(muon): delete intermediate tensors immediately to lower peak mem usage bdd2678 iamwyldecat commited on Jun 15