## GLUE results We also evalute the language understanding performance of Uni-Perceiver on GLUE benchmarks. The results are listed as below.

Dataset	MNLI	QNLI	QQP	RTE	SST-2	MRPC	CoLA
Metric	Acc	Acc	F1	Acc	Acc	F1	Acc
Uni-Perceiver_BASE	79.7	87.3	86.7	71.1	89.3	86.0	43.1
Uni-Perceiver-MoE_BASE	81.5	88.2	87.8	75.8	90.9	87.1	52.2
Uni-Perceiver_LARGE	82.5	89.2	87.7	73.7	91.2	90.2	52.0
Uni-Perceiver-MoE_LARGE	85.7	91.9	89.5	78.4	93.4	91.2	57.4

--- * All fine-tuning experiments are performed on 1 GPU. * We use the hyper-parameters for GLUE tasks from [fair-seq](https://github.com/facebookresearch/fairseq/blob/main/examples/bart/README.glue.md) Model | MNLI | QNLI | QQP | RTE | SST-2 | MRPC | CoLA | STS-B ---|---|---|---|---|---|---|---|--- `--num-classes` | 3 | 2 | 2 | 2 | 2 | 2 | 2 | 1 `--lr` | 5e-6 | 1e-5 | 1e-5 | 1e-5 | 5e-6 | 2e-5 | 2e-5 | 2e-5 `bsz` | 128 | 32 | 32 | 32 | 128 | 64 | 64 | 32 `--total-num-update` | 30968 | 33112 | 113272 | 1018 | 5233 | 1148 | 1334 | 1799 `--warmup-updates` | 1858 | 1986 | 6796 | 61 | 314 | 68 | 80 | 107 | 1334 | 1799 `--warmup-updates` | 1858 | 1986 | 6796 | 61 | 314 | 68 | 80 | 107 * Following RoBerta, we finetune RTE, STS and MRPC starting from the MNLI single-task model, rather than the baseline pretrained model.