File size: 5,560 Bytes
e12dbbd |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 |
# Adapting GPT-2 using LoRA
This folder contains the implementation of LoRA in GPT-2 using the Python package `lora` and steps to replicate the results in our recent paper
**LoRA: Low-Rank Adaptation of Large Language Models** <br>
*Edward J. Hu\*, Yelong Shen\*, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen* <br>
Paper: https://arxiv.org/abs/2106.09685 <br>
<p>
<img src="figures/LoRA_GPT2.PNG" width="800" >
</p>
This repo reproduces our experiments on GPT-2.
## Repository Overview
Our implementation is based on the fine-tuning code for GPT-2 in [Hugging Face](https://huggingface.co/).
There are several directories in this repo:
* [src/](src) contains the source code used for data processing, training, and decoding.
* [eval/](eval) contains the code for task-specific evaluation scripts.
* [data/](data) contains the raw data we used in our experiments.
* [vocab/](vocab) contains the GPT-2 vocabulary files.
## Getting Started
1. You can start with the following docker image: `nvcr.io/nvidia/pytorch:20.03-py3` on a GPU-capable machine, but any generic PyTorch image should work.
```
docker run -it nvcr.io/nvidia/pytorch:20.03-py3
```
2. Clone the repo and install dependencies in a virtual environment (remove sudo if running in docker container):
```
sudo apt-get update
sudo apt-get -y install git jq virtualenv
git clone https://github.com/microsoft/LoRA.git; cd LoRA
virtualenv -p `which python3` ./venv
. ./venv/bin/activate
pip install -r requirement.txt
bash download_pretrained_checkpoints.sh
bash create_datasets.sh
cd ./eval
bash download_evalscript.sh
cd ..
```
#### Now we are ready to replicate the results in our paper.
## Replicating Our Result on E2E
1. Train GPT-2 Medium with LoRA (see our paper for hyperparameters for GPT-2 Medium)
```
python -m torch.distributed.launch --nproc_per_node=1 src/gpt2_ft.py \
--train_data ./data/e2e/train.jsonl \
--valid_data ./data/e2e/valid.jsonl \
--train_batch_size 8 \
--grad_acc 1 \
--valid_batch_size 4 \
--seq_len 512 \
--model_card gpt2.md \
--init_checkpoint ./pretrained_checkpoints/gpt2-medium-pytorch_model.bin \
--platform local \
--clip 0.0 \
--lr 0.0002 \
--weight_decay 0.01 \
--correct_bias \
--adam_beta2 0.999 \
--scheduler linear \
--warmup_step 500 \
--max_epoch 5 \
--save_interval 1000 \
--lora_dim 4 \
--lora_alpha 32 \
--lora_dropout 0.1 \
--label_smooth 0.1 \
--work_dir ./trained_models/GPT2_M/e2e \
--random_seed 110
```
2. Generate outputs from the trained model using beam search:
```
python -m torch.distributed.launch --nproc_per_node=1 src/gpt2_beam.py \
--data ./data/e2e/test.jsonl \
--batch_size 1 \
--seq_len 512 \
--eval_len 64 \
--model_card gpt2.md \
--init_checkpoint ./trained_models/GPT2_M/e2e/model.26289.pt \
--platform local \
--lora_dim 4 \
--lora_alpha 32 \
--beam 10 \
--length_penalty 0.8 \
--no_repeat_ngram_size 4 \
--repetition_penalty 1.0 \
--eos_token_id 628 \
--work_dir ./trained_models/GPT2_M/e2e \
--output_file predict.26289.b10p08r4.jsonl
```
3. Decode outputs from step (2)
```
python src/gpt2_decode.py \
--vocab ./vocab \
--sample_file ./trained_models/GPT2_M/e2e/predict.26289.b10p08r4.jsonl \
--input_file ./data/e2e/test_formatted.jsonl \
--output_ref_file e2e_ref.txt \
--output_pred_file e2e_pred.txt
```
4. Run evaluation on E2E test set
```
python eval/e2e/measure_scores.py e2e_ref.txt e2e_pred.txt -p
```
## Replicating Our Result on WebNLG
1. Follow steps 1 and 2 from E2E pipeline by replacing references to E2E with webnlg (see our paper for hyperparameters)
2. Decode outputs from beam search (step 2 above)
```
python src/gpt2_decode.py \
--vocab ./vocab \
--sample_file ./trained_models/GPT2_M/webnlg/predict.20000.b10p08.jsonl \
--input_file ./data/webnlg_challenge_2017/test_formatted.jsonl \
--ref_type webnlg \
--ref_num 6 \
--output_ref_file eval/GenerationEval/data/references_webnlg \
--output_pred_file eval/GenerationEval/data/hypothesis_webnlg \
--tokenize --lower
```
3. Run evaluation on WebNLG test set
```
cd ./eval/GenerationEval/
python eval.py \
-R data/references_webnlg/reference \
-H data/hypothesis_webnlg \
-nr 6 \
-m bleu,meteor,ter
cd ../..
```
## Replicating Our Result on DART
1. Follow steps 1 and 2 from E2E pipeline by replacing references to E2E with dart (see our paper for hyperparameters)
2. Decode outputs from beam search (step 2 above)
```
python src/gpt2_decode.py \
--vocab ./vocab \
--sample_file ./trained_models/GPT2_M/dart/predict.20000.b10p08.jsonl \
--input_file ./data/dart/test_formatted.jsonl \
--ref_type dart \
--ref_num 6 \
--output_ref_file eval/GenerationEval/data/references_dart \
--output_pred_file eval/GenerationEval/data/hypothesis_dart \
--tokenize --lower
```
3. Run evaluation on Dart test set
```
cd ./eval/GenerationEval/
python eval.py \
-R data/references_dart/reference \
-H data/hypothesis_dart \
-nr 6 \
-m bleu,meteor,ter
cd ../..
```
## Citation
```
@misc{hu2021lora,
title={LoRA: Low-Rank Adaptation of Large Language Models},
author={Hu, Edward and Shen, Yelong and Wallis, Phil and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Lu and Chen, Weizhu},
year={2021},
eprint={2106.09685},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
``` |