File size: 2,634 Bytes
9853e24 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
# How To
## Steps in running the code
The model is trained using the privacyFineTune.ipynb file and is queried with
the testPrivacy.ipynb file. There is no need to run the fine-tuning again as
the model is already avalible on huggingface however we will still give
instruction on how to do it.
## fine-tuning
Fine tuning is pretty simple as the majority of it has alread been set up for
easy use. If you wish to train the model the same as we did, just run
everything as is and then log in using your personal huggingface token.
If you wish to use a different model or dataset, then do the following
* As this file is made specifically for fine-tuning Llama 2, we suggest keeping
the model_name variable untouced. However if you want to change it, change it to the
model name you wish avalible in the Transformers Library.
* Change the dataset name to whatever dataset is desired (must be supported by
huggingface dataset library).
* Change the new_model name to whatever you want.
* adjust the dataset_text_field="" variable to the name of the row of text in
your dataset. For example in the sjsq dataset the text column is called "Text".
* Change the prompt variable to a question you want to ask it.
Aside from these changes, run the file from top to bottom to train the model.
Should take about 25 minutes in total.
## Prompting
The testPrivacy.ipynb file contains the test prompts that were used to test the
model. It should be set to run as is with our model. If you wish to add custom
prompts to the file, do so by creating a new codeblock and using the syntax
```
prompt = "Your question"
```
We also provided two different privacy policies for refrence. The first is
from [TopHive](https://tophive.ai/privacy-policy) and the second is from
[Starbucks](https://www.starbucks.com/terms/privacy-policy/). The starbucks one
does not work as it is too big and you run out of GPU ram fast on the free
colab plus Llama doesn't like how many words are in it.
TopHive does work however. To use these privacy policies in your prompt change
the policy variable to the name of the company you are using the policy of
```
policy = starbucks
```
Then run one of the question boxes or make your own prompt.
To run the text generation chose a prompt first by running that box then run
```
result = pipe(f"<s>[INST] {prompt} [/INST]")
print('\n',result[0]['generated_text'])
resultList.append(result[0]['generated_text'])
```
Other than that basically run the code from start to bottom until you get to
the prompt section. All prompts are saved in the "resultList" variable. |