File size: 9,818 Bytes
8ebda9e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 |
[**中文**](./README.md) | [**English**](./README_en.md)
# UniMC
EMNLP 2022 论文 《[Zero-Shot Learners for Natural Language Understanding via a Unified Multiple Choice Perspective](https://arxiv.org/abs/2210.08590)》源码

## Update
- [2022-10-18] Release preprint in arXiv.
- [2022-10-14] Release code in GitHub.
## Requirements
安装 fengshen 框架
```shell
git clone https://github.com/IDEA-CCNL/Fengshenbang-LM.git
cd Fengshenbang-LM
pip install --editable .
```
## Quick Start
你可以参考我们的 [example.py](./example.py) 脚本,只需要将处理好的 train、dev、test 即输入模型即可。
```python
import argparse
from fengshen.pipelines.multiplechoice import UniMCPipelines
total_parser = argparse.ArgumentParser("TASK NAME")
total_parser = UniMCPipelines.piplines_args(total_parser)
args = total_parser.parse_args()
pretrained_model_path = 'IDEA-CCNL/Erlangshen-UniMC-RoBERTa-110M-Chinese'
args.learning_rate=2e-5
args.max_length=512
args.max_epochs=3
args.batchsize=8
args.default_root_dir='./'
model = UniMCPipelines(args,model_path=pretrained_model_path)
train_data = []
dev_data = []
test_data = [{
"texta": "就是废物,充电不进害得老子把主板烧了,客服不耐烦",
"textb": "",
"question": "",
"choice": ["这是一条差评", "这是一条好评"],
"answer": "这是一条差评",
"label": 0,
"id": 31
}]
if args.train:
model.train(train_data, dev_data)
result = model.predict(test_data)
```
## Pretrained Model
对于英文模型,我们使用14份 multiplechoice 数据集进行了预训练。在中文模型中,我们已经收集了48份数据集对模型进行预训练,我们已经将预训练模型开源到 HuggingFace 社区当中。
| 模型 | 地址 |
|:---------:|:--------------:|
| Erlangshen-UniMC-Albert-235M-English | [https://huggingface.co/IDEA-CCNL/Erlangshen-UniMC-Albert-235M-English](https://huggingface.co/IDEA-CCNL/Erlangshen-UniMC-Albert-235M-English) |
| Erlangshen-UniMC-RoBERTa-110M-Chinese | [https://huggingface.co/IDEA-CCNL/Erlangshen-UniMC-RoBERTa-110M-Chinese](https://huggingface.co/IDEA-CCNL/Erlangshen-UniMC-RoBERTa-110M-Chinese) |
| Erlangshen-UniMC-RoBERTa-330M-Chinese | [https://huggingface.co/IDEA-CCNL/Erlangshen-UniMC-RoBERTa-330M-Chinese](https://huggingface.co/IDEA-CCNL/Erlangshen-UniMC-RoBERTa-330M-Chinese) |
| Erlangshen-UniMC-MegatronBERT-1.3B-Chinese | [https://huggingface.co/IDEA-CCNL/Erlangshen-UniMC-MegatronBERT-1.3B-Chinese](https://huggingface.co/IDEA-CCNL/Erlangshen-UniMC-MegatronBERT-1.3B-Chinese) |
## Experiments
### English
为了测评 UniMC 的性能,在英文中,我们使用 14份 multiple-choice 数据集(具体数据参考原论文)来对模型进行预训练,使其具备做选择题的能力,
**Zero-shot**
| Model | T0 11B | GLaM 60B | FLAN 137B | PaLM 540B | UniMC 235M |
|---------|--------|----------|-----------|-----------|------------|
| ANLI R1 | 43.6 | 40.9 | 47.7 | 48.4 | **52.0** |
| ANLI R2 | 38.7 | 38.2 | 43.9 | 44.2 | **44.4** |
| ANLI R3 | 41.3 | 40.9 | 47.0 | 45.7 | **47.8** |
| CB | 70.1 | 33.9 | 64.1 | 51.8 | **75.7** |
### Chinese
为了测评 UniMC 在中文场景下的性能我们使用 13份 有监督数据集来对模型进行预训练,预训练数据如下:
| Task type | Task | # of option | Data size |
|---------|--------|----------|-----------|
| Multiple-choice | c3 | 4 | 11.8k |
| Multiple-choice | ClozeT | 2 | 0.7k |
| Multiple-choice | CMRC2019 | n | 11.4k |
| Multiple-choice | GCRC | 4 | 7.8k |
| Classification | DuEE-Fin | 12 | 4.3k |
| Classification | DuEE1.0 | 65 | 10.3k |
| Classification | Fudan | 20 | 19.6k |
| Classification | THUNEWS | 10 | 180k |
| NLI | CMNLI | 3 | 39k |
| NLI | SNLI | 3 | 545.8k |
| Paraphrace | AFQMC | 2 | 34.3k |
| Paraphrace | PAWS-X | 2 | 49k |
| Paraphrace | STS-B | 2 | 80k |
我们使用中文领域常用的benchmark来测试UniMC的性能,具体是FewCLUE的9个任务,我们在 test_public 上测评模型的性能。
**Few-shot**
| Model | eprstmt | csldcp | tnews | iflytek | ocnli | bustm | chid | csl | wsc | Avg |
|------------|------------|----------|-----------|----------|-----------|-----------|-----------|----------|-----------|-----------|
| Finetuning | 65.4 | 35.5 | 49 | 32.8 | 33 | 60.7 | 14.9 | 50 | 55.6 | 44.1 |
| PET | 86.7 | 51.7 | 54.5 | 46 | 44 | 56 | 61.2 | 59.4 | 57.5 | 57.44 |
| LM-BFF | 85.6 | 54.4 | 53 | 47.1 | 41.6 | 57.6 | 61.2 | 51.7 | 54.7 | 56.32 |
| P-tuning | 88.3 | 56 | 54.2 | **57.6** | 41.9 | 60.9 | 59.3 | **62.9** | 58.1 | 59.91 |
| EFL | 84.9 | 45 | 52.1 | 42.7 | 66.2 | 71.8 | 30.9 | 56.6 | 53 | 55.91 |
| [UniMC-RoBERTa-110M](https://huggingface.co/IDEA-CCNL/Erlangshen-UniMC-RoBERTa-110M-Chinese) | 88.64 | 54.08 | 54.32 | 48.6 | 66.55 | 73.76 | 67.71 | 52.54 | 59.92 | 62.86 |
| [UniMC-RoBERTa-330M](https://huggingface.co/IDEA-CCNL/Erlangshen-UniMC-RoBERTa-330M-Chinese) | 89.53 | 57.3 | 54.25 | 50 | 70.59 | 77.49 | 78.09 | 55.73 | 65.16 | 66.46 |
| [UniMC-MegatronBERT-1.3B](https://huggingface.co/IDEA-CCNL/Erlangshen-UniMC-MegatronBERT-1.3B-Chinese) | **89.278** | **60.9** | **57.46** | 52.89 | **76.33** | **80.37** | **90.33** | 61.73 | **79.15** | **72.05** |
**Zero-shot**
| Model | eprstmt | csldcp | tnews | iflytek | ocnli | bustm | chid | csl | wsc | Avg |
|---------------|-----------|-----------|-----------|-----------|-----------|----------|----------|----------|-----------|-----------|
| GPT-zero | 57.5 | 26.2 | 37 | 19 | 34.4 | 50 | 65.6 | 50.1 | 50.3 | 43.4 |
| PET-zero | 85.2 | 12.6 | 26.1 | 26.6 | 40.3 | 50.6 | 57.6 | 52.2 | 54.7 | 45.1 |
| NSP-BERT | 86.9 | 47.6 | 51 | 41.6 | 37.4 | 63.4 | 52 | **64.4** | 59.4 | 55.96 |
| ZeroPrompt | - | - | - | 16.14 | 46.16 | - | - | - | 47.98 | - |
| Yuan1.0-13B | 88.13 | 38.99 | 57.47 | 38.82 | 48.13 | 59.38 | 86.14 | 50 | 38.99 | 56.22 |
| ERNIE3.0-240B | 88.75 | **50.97** | **57.83** | **40.42** | 53.57 | 64.38 | 87.13 | 56.25 | 53.46 | 61.41 |
| [UniMC-RoBERTa-110M](https://huggingface.co/IDEA-CCNL/Erlangshen-UniMC-RoBERTa-110M-Chinese) | 86.16 | 31.26 | 46.61 | 26.54 | 66.91 | 73.34 | 66.68 | 50.09 | 53.66 | 55.7 |
| [UniMC-RoBERTa-330M](https://huggingface.co/IDEA-CCNL/Erlangshen-UniMC-RoBERTa-330M-Chinese) | 87.5 | 30.4 | 47.6 | 31.5 | 69.9 | 75.9 | 78.17 | 49.5 | 60.55 | 59.01 |
| [UniMC-MegatronBERT-1.3B](https://huggingface.co/IDEA-CCNL/Erlangshen-UniMC-MegatronBERT-1.3B-Chinese) | **88.79** | 42.06 | 55.21 | 33.93 | **75.57** | **79.5** | **89.4** | 50.25 | **66.67** | **64.53** |
## Dataset
我们已经定义好了 UniMC 所需的数据格式,你只需要将数据转化为下面的数据格式即可:
### 文本分类
```json
{
"texta": "街头偶遇2018款长安CS35,颜值美炸!或售6万起,还买宝骏510?",
"textb": "",
"question": "下面新闻属于哪一个类别?",
"choice": [
"房产",
"汽车",
"教育",
"军事"
],
"answer": "汽车",
"label": 1,
"id": 7759
}
```
### 情感分析
```json
{
"texta": "就是废物,充电不进害得老子把主板烧了,客服不耐烦",
"textb": "",
"question": "",
"choice": ["这是一条差评", "这是一条好评"],
"answer": "这是一条差评",
"label": 0,
"id": 31
}
```
### 语义匹配
```json
{
"texta": "不要借了我是试试看能否操作的",
"textb": "",
"question": "",
"choice": ["不能理解为:借款审核期间能否取消借款", "可以理解为:借款审核期间能否取消借款"],
"answer": "不能理解为:借款审核期间能否取消借款",
"label": 0,
"id": 0
}
```
### 自然语言推理
```json
{
"texta": "身上裹一件工厂发的棉大衣,手插在袖筒里",
"textb": "",
"question": "",
"choice": ["不能推断出:身上至少一件衣服", "很难推断出:身上至少一件衣服", "可以推断出:身上至少一件衣服"],
"answer": "可以推断出:身上至少一件衣服",
"label": 2,
"id": 0
}
```
## Citation
如果你觉得本仓库帮助到了你,你可以使用下面方式引用我们的工作
```text
@article{unimc,
author = {Ping Yang and
Junjie Wang and
Ruyi Gan and
Xinyu Zhu and
Lin Zhang and
Ziwei Wu and
Xinyu Gao and
Jiaxing Zhang and
Tetsuya Sakai},
title = {Zero-Shot Learners for Natural Language Understanding via a Unified Multiple Choice Perspective},
journal = {CoRR},
volume = {abs/2210.08590},
year = {2022}
}
```
## License
[Apache License 2.0](https://github.com/IDEA-CCNL/Fengshenbang-LM/blob/main/LICENSE)
|