kenshinn commited on
Commit
e8c6ebd
Β·
verified Β·
1 Parent(s): 32a51df

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -3
README.md CHANGED
@@ -1,3 +1,58 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+ <h2 align="center"> <a href="https://arxiv.org/abs/2405.14297">Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models</a></h2>
6
+ <h5 align="center"> If our project helps you, please give us a star ⭐ on <a href="https://github.com/LINs-lab/DynMoE">GitHub</a> and cite our paper!</h2>
7
+ <h5 align="center">
8
+
9
+ ## πŸ“° News
10
+
11
+ - **[2024.05.25]** πŸ”₯ Our **checkpoints** are available now!
12
+ - **[2024.05.23]** πŸ”₯ Our [paper](https://arxiv.org/abs/2405.14297) is released!
13
+
14
+ ## 😎 What's Interesting?
15
+
16
+ **Dynamic Mixture of Experts (DynMoE)** incorporates (1) a novel gating method that enables each token to automatically determine the number of experts to activate. (2) An adaptive process automatically adjusts the number of experts during training.
17
+
18
+ ### Top-Any Gating
19
+
20
+ <video controls src="https://i.imgur.com/bLgNaoH.mp4" title="Top-Any Gating"></video>
21
+
22
+ ### Adaptive Training Process
23
+
24
+ ![](https://cdn.jsdelivr.net/gh/QAQdev/Pics@master/uPic/adaptive.png)
25
+
26
+ ## πŸ’‘ Model Details
27
+
28
+ - πŸ€” DynMoE-Phi-2 is a MoE model with **dynamic top-k gating**, finetuned on [LanguageBind/MoE-LLaVA-Phi2-Stage2](https://huggingface.co/LanguageBind/MoE-LLaVA-Phi2-Stage2).
29
+ - πŸš€ Our DynMoE-Phi-2-2.7B has totally 5.3B parameters, but **only 3.4B are activated!** (average top-k = 1.68)
30
+ - βŒ› With the DynMoE tuning stage, we can complete training on 8 A100 GPUs **within 2 days.**
31
+
32
+ ## πŸ‘ Acknowledgement
33
+
34
+ We are grateful for the following awesome projects:
35
+
36
+ - [tutel](https://github.com/microsoft/tutel)
37
+ - [DeepSpeed](https://github.com/microsoft/DeepSpeed)
38
+ - [GMoE](https://github.com/Luodian/Generalizable-Mixture-of-Experts)
39
+ - [EMoE](https://github.com/qiuzh20/EMoE)
40
+ - [MoE-LLaVA](https://github.com/PKU-YuanGroup/MoE-LLaVA)
41
+ - [GLUE-X](https://github.com/YangLinyi/GLUE-X)
42
+
43
+ ## πŸ”’ License
44
+
45
+ This project is released under the MIT license as found in the [LICENSE](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md) file.
46
+
47
+ ## ✏️ Citation
48
+
49
+ ```tex
50
+ @misc{guo2024dynamic,
51
+ title={Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models},
52
+ author={Yongxin Guo and Zhenglin Cheng and Xiaoying Tang and Tao Lin},
53
+ year={2024},
54
+ eprint={2405.14297},
55
+ archivePrefix={arXiv},
56
+ primaryClass={cs.LG}
57
+ }
58
+ ```