fix(optimizer): resolve bug where weight decay was multiplied by wrong lr value (#5) 671b033 verified dongseokmotif commited on 6 days ago
refactor(muon): change argument adam_wd to weight_decay and handle params' type 02ac540 iamwyldecat commited on Jun 23
fix(muon): delete intermediate tensors immediately to lower peak mem usage bdd2678 iamwyldecat commited on Jun 15