Upload HOW-TO.md
Browse files
HOW-TO.md
ADDED
@@ -0,0 +1,55 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# How To
|
2 |
+
## Steps in running the code
|
3 |
+
The model is trained using the privacyFineTune.ipynb file and is queried with
|
4 |
+
the testPrivacy.ipynb file. There is no need to run the fine-tuning again as
|
5 |
+
the model is already avalible on huggingface however we will still give
|
6 |
+
instruction on how to do it.
|
7 |
+
|
8 |
+
## fine-tuning
|
9 |
+
Fine tuning is pretty simple as the majority of it has alread been set up for
|
10 |
+
easy use. If you wish to train the model the same as we did, just run
|
11 |
+
everything as is and then log in using your personal huggingface token.
|
12 |
+
|
13 |
+
If you wish to use a different model or dataset, then do the following
|
14 |
+
|
15 |
+
* As this file is made specifically for fine-tuning Llama 2, we suggest keeping
|
16 |
+
the model_name variable untouced. However if you want to change it, change it to the
|
17 |
+
model name you wish avalible in the Transformers Library.
|
18 |
+
* Change the dataset name to whatever dataset is desired (must be supported by
|
19 |
+
huggingface dataset library).
|
20 |
+
* Change the new_model name to whatever you want.
|
21 |
+
* adjust the dataset_text_field="" variable to the name of the row of text in
|
22 |
+
your dataset. For example in the sjsq dataset the text column is called "Text".
|
23 |
+
* Change the prompt variable to a question you want to ask it.
|
24 |
+
|
25 |
+
Aside from these changes, run the file from top to bottom to train the model.
|
26 |
+
Should take about 25 minutes in total.
|
27 |
+
|
28 |
+
## Prompting
|
29 |
+
The testPrivacy.ipynb file contains the test prompts that were used to test the
|
30 |
+
model. It should be set to run as is with our model. If you wish to add custom
|
31 |
+
prompts to the file, do so by creating a new codeblock and using the syntax
|
32 |
+
```
|
33 |
+
prompt = "Your question"
|
34 |
+
```
|
35 |
+
We also provided two different privacy policies for refrence. The first is
|
36 |
+
from [TopHive](https://tophive.ai/privacy-policy) and the second is from
|
37 |
+
[Starbucks](https://www.starbucks.com/terms/privacy-policy/). The starbucks one
|
38 |
+
does not work as it is too big and you run out of GPU ram fast on the free
|
39 |
+
colab plus Llama doesn't like how many words are in it.
|
40 |
+
TopHive does work however. To use these privacy policies in your prompt change
|
41 |
+
the policy variable to the name of the company you are using the policy of
|
42 |
+
```
|
43 |
+
policy = starbucks
|
44 |
+
```
|
45 |
+
|
46 |
+
Then run one of the question boxes or make your own prompt.
|
47 |
+
|
48 |
+
To run the text generation chose a prompt first by running that box then run
|
49 |
+
```
|
50 |
+
result = pipe(f"<s>[INST] {prompt} [/INST]")
|
51 |
+
print('\n',result[0]['generated_text'])
|
52 |
+
resultList.append(result[0]['generated_text'])
|
53 |
+
```
|
54 |
+
Other than that basically run the code from start to bottom until you get to
|
55 |
+
the prompt section. All prompts are saved in the "resultList" variable.
|