Chat-UniVi commited on
Commit
b303770
·
verified ·
1 Parent(s): 51246bb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -4,7 +4,7 @@ license: apache-2.0
4
  # MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
5
 
6
  **Paper or resources for more information:**
7
- [[Paper]()] [[Code](https://github.com/SkyworkAI/MoE-plus-plus)]
8
 
9
  ## ⚡ Overview
10
  We introduce three types of zero-computation experts: the zero expert, copy expert, and constant expert, which correspond to discard, skip, and replace operations, respectively. Moreover, we leverage gating residuals, enabling each token to consider the pathway taken in the previous layer when selecting the appropriate experts.
@@ -76,7 +76,7 @@ Coming soon...
76
 
77
  CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch \
78
  --main_process_port 2004 -m lm_eval --model hf \
79
- --model_args pretrained=winogrande \
80
  --tasks winogrande \
81
  --batch_size 1 \
82
  --output_path Results/winogrande
 
4
  # MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
5
 
6
  **Paper or resources for more information:**
7
+ [[Paper]()] [[Code](https://github.com/SkyworkAI/MoH)]
8
 
9
  ## ⚡ Overview
10
  We introduce three types of zero-computation experts: the zero expert, copy expert, and constant expert, which correspond to discard, skip, and replace operations, respectively. Moreover, we leverage gating residuals, enabling each token to consider the pathway taken in the previous layer when selecting the appropriate experts.
 
76
 
77
  CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch \
78
  --main_process_port 2004 -m lm_eval --model hf \
79
+ --model_args pretrained=MoE-Plus-Plus-7B \
80
  --tasks winogrande \
81
  --batch_size 1 \
82
  --output_path Results/winogrande