|
# How To
|
|
## Steps in running the code
|
|
The model is trained using the privacyFineTune.ipynb file and is queried with
|
|
the testPrivacy.ipynb file. There is no need to run the fine-tuning again as
|
|
the model is already avalible on huggingface however we will still give
|
|
instruction on how to do it.
|
|
|
|
## fine-tuning
|
|
Fine tuning is pretty simple as the majority of it has alread been set up for
|
|
easy use. If you wish to train the model the same as we did, just run
|
|
everything as is and then log in using your personal huggingface token.
|
|
|
|
If you wish to use a different model or dataset, then do the following
|
|
|
|
* As this file is made specifically for fine-tuning Llama 2, we suggest keeping
|
|
the model_name variable untouced. However if you want to change it, change it to the
|
|
model name you wish avalible in the Transformers Library.
|
|
* Change the dataset name to whatever dataset is desired (must be supported by
|
|
huggingface dataset library).
|
|
* Change the new_model name to whatever you want.
|
|
* adjust the dataset_text_field="" variable to the name of the row of text in
|
|
your dataset. For example in the sjsq dataset the text column is called "Text".
|
|
* Change the prompt variable to a question you want to ask it.
|
|
|
|
Aside from these changes, run the file from top to bottom to train the model.
|
|
Should take about 25 minutes in total.
|
|
|
|
## Prompting
|
|
The testPrivacy.ipynb file contains the test prompts that were used to test the
|
|
model. It should be set to run as is with our model. If you wish to add custom
|
|
prompts to the file, do so by creating a new codeblock and using the syntax
|
|
```
|
|
prompt = "Your question"
|
|
```
|
|
We also provided two different privacy policies for refrence. The first is
|
|
from [TopHive](https://tophive.ai/privacy-policy) and the second is from
|
|
[Starbucks](https://www.starbucks.com/terms/privacy-policy/). The starbucks one
|
|
does not work as it is too big and you run out of GPU ram fast on the free
|
|
colab plus Llama doesn't like how many words are in it.
|
|
TopHive does work however. To use these privacy policies in your prompt change
|
|
the policy variable to the name of the company you are using the policy of
|
|
```
|
|
policy = starbucks
|
|
```
|
|
|
|
Then run one of the question boxes or make your own prompt.
|
|
|
|
To run the text generation chose a prompt first by running that box then run
|
|
```
|
|
result = pipe(f"<s>[INST] {prompt} [/INST]")
|
|
print('\n',result[0]['generated_text'])
|
|
resultList.append(result[0]['generated_text'])
|
|
```
|
|
Other than that basically run the code from start to bottom until you get to
|
|
the prompt section. All prompts are saved in the "resultList" variable. |