Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
2
wujing
wjmcat
Follow
0 followers
ยท
4 following
http://wj-Mcat.github.io
wj_Mcat
wj-Mcat
AI & ML interests
student who wants to search for nlp models
Recent Activity
new
activity
6 days ago
deepseek-ai/DeepSeek-V3.1-Base:
the pass@1 of deepseek-v3.1-base in lcb benchmark
reacted
to
yushun0410
's
post
with ๐
about 1 year ago
Hi Huggingfacers! Thrilled to introduce Adam-mini, an optimizer that achieves on-par or better performance than AdamW with 45% to 50% less memory footprint. Adam-mini can also achieve 49.5% higher throughput than AdamW on Llama2-7B pre-training. The design of Adam-mini is inspired by certain Hessian structures we observed on Transformers. Feel free to try it out! Try switching to Adam-mini with the same hyperparams of AdamW, it would work with only half memory. Hope Adam-mini can help save time, cost, and energy in your tasks! Paper: "Adam-mini: Use Fewer Learning Rates To Gain More" https://arxiv.org/abs/2406.16793 Code: https://github.com/zyushun/Adam-mini
reacted
to
yushun0410
's
post
with ๐ฅ
about 1 year ago
Hi Huggingfacers! Thrilled to introduce Adam-mini, an optimizer that achieves on-par or better performance than AdamW with 45% to 50% less memory footprint. Adam-mini can also achieve 49.5% higher throughput than AdamW on Llama2-7B pre-training. The design of Adam-mini is inspired by certain Hessian structures we observed on Transformers. Feel free to try it out! Try switching to Adam-mini with the same hyperparams of AdamW, it would work with only half memory. Hope Adam-mini can help save time, cost, and energy in your tasks! Paper: "Adam-mini: Use Fewer Learning Rates To Gain More" https://arxiv.org/abs/2406.16793 Code: https://github.com/zyushun/Adam-mini
View all activity
Organizations
spaces
1
Runtime error
Text Similarity
๐
models
3
Sort:ย Recently updated
wjmcat/opt-350m-paddle
Updated
Sep 1, 2022
wjmcat/opt-1.3b-paddle
Updated
May 15, 2022
wjmcat/opt-125m-paddle
Updated
May 15, 2022
datasets
0
None public yet