ChrisPuzzo commited on
Commit
9853e24
·
verified ·
1 Parent(s): f1fbf0f

Upload HOW-TO.md

Browse files
Files changed (1) hide show
  1. HOW-TO.md +55 -0
HOW-TO.md ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # How To
2
+ ## Steps in running the code
3
+ The model is trained using the privacyFineTune.ipynb file and is queried with
4
+ the testPrivacy.ipynb file. There is no need to run the fine-tuning again as
5
+ the model is already avalible on huggingface however we will still give
6
+ instruction on how to do it.
7
+
8
+ ## fine-tuning
9
+ Fine tuning is pretty simple as the majority of it has alread been set up for
10
+ easy use. If you wish to train the model the same as we did, just run
11
+ everything as is and then log in using your personal huggingface token.
12
+
13
+ If you wish to use a different model or dataset, then do the following
14
+
15
+ * As this file is made specifically for fine-tuning Llama 2, we suggest keeping
16
+ the model_name variable untouced. However if you want to change it, change it to the
17
+ model name you wish avalible in the Transformers Library.
18
+ * Change the dataset name to whatever dataset is desired (must be supported by
19
+ huggingface dataset library).
20
+ * Change the new_model name to whatever you want.
21
+ * adjust the dataset_text_field="" variable to the name of the row of text in
22
+ your dataset. For example in the sjsq dataset the text column is called "Text".
23
+ * Change the prompt variable to a question you want to ask it.
24
+
25
+ Aside from these changes, run the file from top to bottom to train the model.
26
+ Should take about 25 minutes in total.
27
+
28
+ ## Prompting
29
+ The testPrivacy.ipynb file contains the test prompts that were used to test the
30
+ model. It should be set to run as is with our model. If you wish to add custom
31
+ prompts to the file, do so by creating a new codeblock and using the syntax
32
+ ```
33
+ prompt = "Your question"
34
+ ```
35
+ We also provided two different privacy policies for refrence. The first is
36
+ from [TopHive](https://tophive.ai/privacy-policy) and the second is from
37
+ [Starbucks](https://www.starbucks.com/terms/privacy-policy/). The starbucks one
38
+ does not work as it is too big and you run out of GPU ram fast on the free
39
+ colab plus Llama doesn't like how many words are in it.
40
+ TopHive does work however. To use these privacy policies in your prompt change
41
+ the policy variable to the name of the company you are using the policy of
42
+ ```
43
+ policy = starbucks
44
+ ```
45
+
46
+ Then run one of the question boxes or make your own prompt.
47
+
48
+ To run the text generation chose a prompt first by running that box then run
49
+ ```
50
+ result = pipe(f"<s>[INST] {prompt} [/INST]")
51
+ print('\n',result[0]['generated_text'])
52
+ resultList.append(result[0]['generated_text'])
53
+ ```
54
+ Other than that basically run the code from start to bottom until you get to
55
+ the prompt section. All prompts are saved in the "resultList" variable.