Update README.md
Browse files
README.md
CHANGED
@@ -4,7 +4,7 @@ license: apache-2.0
|
|
4 |
# MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
|
5 |
|
6 |
**Paper or resources for more information:**
|
7 |
-
[[Paper]()] [[Code](https://github.com/SkyworkAI/MoE-plus-plus)]
|
8 |
|
9 |
## ⚡ Overview
|
10 |
We introduce three types of zero-computation experts: the zero expert, copy expert, and constant expert, which correspond to discard, skip, and replace operations, respectively. Moreover, we leverage gating residuals, enabling each token to consider the pathway taken in the previous layer when selecting the appropriate experts.
|
|
|
4 |
# MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
|
5 |
|
6 |
**Paper or resources for more information:**
|
7 |
+
[[Paper](https://huggingface.co/papers/2410.07348)] [[Code](https://github.com/SkyworkAI/MoE-plus-plus)]
|
8 |
|
9 |
## ⚡ Overview
|
10 |
We introduce three types of zero-computation experts: the zero expert, copy expert, and constant expert, which correspond to discard, skip, and replace operations, respectively. Moreover, we leverage gating residuals, enabling each token to consider the pathway taken in the previous layer when selecting the appropriate experts.
|