Update README.md
Browse files
README.md
CHANGED
@@ -4,7 +4,7 @@ license: apache-2.0
|
|
4 |
# MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
|
5 |
|
6 |
**Paper or resources for more information:**
|
7 |
-
[[Paper]()] [[Code](https://github.com/SkyworkAI/
|
8 |
|
9 |
## ⚡ Overview
|
10 |
We introduce three types of zero-computation experts: the zero expert, copy expert, and constant expert, which correspond to discard, skip, and replace operations, respectively. Moreover, we leverage gating residuals, enabling each token to consider the pathway taken in the previous layer when selecting the appropriate experts.
|
@@ -76,7 +76,7 @@ Coming soon...
|
|
76 |
|
77 |
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch \
|
78 |
--main_process_port 2004 -m lm_eval --model hf \
|
79 |
-
--model_args pretrained=
|
80 |
--tasks winogrande \
|
81 |
--batch_size 1 \
|
82 |
--output_path Results/winogrande
|
|
|
4 |
# MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
|
5 |
|
6 |
**Paper or resources for more information:**
|
7 |
+
[[Paper]()] [[Code](https://github.com/SkyworkAI/MoH)]
|
8 |
|
9 |
## ⚡ Overview
|
10 |
We introduce three types of zero-computation experts: the zero expert, copy expert, and constant expert, which correspond to discard, skip, and replace operations, respectively. Moreover, we leverage gating residuals, enabling each token to consider the pathway taken in the previous layer when selecting the appropriate experts.
|
|
|
76 |
|
77 |
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch \
|
78 |
--main_process_port 2004 -m lm_eval --model hf \
|
79 |
+
--model_args pretrained=MoE-Plus-Plus-7B \
|
80 |
--tasks winogrande \
|
81 |
--batch_size 1 \
|
82 |
--output_path Results/winogrande
|