Chat-UniVi
/

MoE-Plus-Plus-7B

Text Generation

Model card Files Files and versions Community

Chat-UniVi commited on Oct 9, 2024

Commit

b303770

·

verified ·

1 Parent(s): 51246bb

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -4,7 +4,7 @@ license: apache-2.0
 # MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
 **Paper or resources for more information:**
-[[Paper]()] [[Code](https://github.com/SkyworkAI/MoE-plus-plus)]
 ## ⚡ Overview
 We introduce three types of zero-computation experts: the zero expert, copy expert, and constant expert, which correspond to discard, skip, and replace operations, respectively. Moreover, we leverage gating residuals, enabling each token to consider the pathway taken in the previous layer when selecting the appropriate experts.
@@ -76,7 +76,7 @@ Coming soon...
 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch \
 --main_process_port 2004 -m lm_eval --model hf \
---model_args pretrained=winogrande \
 --tasks winogrande \
 --batch_size 1 \
 --output_path Results/winogrande

 # MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
 **Paper or resources for more information:**
+[[Paper]()] [[Code](https://github.com/SkyworkAI/MoH)]
 ## ⚡ Overview
 We introduce three types of zero-computation experts: the zero expert, copy expert, and constant expert, which correspond to discard, skip, and replace operations, respectively. Moreover, we leverage gating residuals, enabling each token to consider the pathway taken in the previous layer when selecting the appropriate experts.
 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch \
 --main_process_port 2004 -m lm_eval --model hf \
+--model_args pretrained=MoE-Plus-Plus-7B \
 --tasks winogrande \
 --batch_size 1 \
 --output_path Results/winogrande