ChrisPuzzo
/

llama-2-7b-privacy

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

ChrisPuzzo commited on May 10, 2024

Commit

9853e24

·

verified ·

1 Parent(s): f1fbf0f

Upload HOW-TO.md

Files changed (1) hide show

HOW-TO.md +55 -0

HOW-TO.md ADDED Viewed

	@@ -0,0 +1,55 @@

+# How To
+## Steps in running the code
+The model is trained using the privacyFineTune.ipynb file and is queried with
+the testPrivacy.ipynb file. There is no need to run the fine-tuning again as
+the model is already avalible on huggingface however we will still give
+instruction on how to do it.
+## fine-tuning
+Fine tuning is pretty simple as the majority of it has alread been set up for
+easy use. If you wish to train the model the same as we did, just run
+everything as is and then log in using your personal huggingface token.
+If you wish to use a different model or dataset, then do the following
+* As this file is made specifically for fine-tuning Llama 2, we suggest keeping
+the model_name variable untouced. However if you want to change it, change it to the
+model name you wish avalible in the Transformers Library.
+* Change the dataset name to whatever dataset is desired (must be supported by
+huggingface dataset library).
+* Change the new_model name to whatever you want.
+* adjust the dataset_text_field="" variable to the name of the row of text in
+your dataset. For example in the sjsq dataset the text column is called "Text".
+* Change the prompt variable to a question you want to ask it.
+Aside from these changes, run the file from top to bottom to train the model.
+Should take about 25 minutes in total.
+## Prompting
+The testPrivacy.ipynb file contains the test prompts that were used to test the
+model. It should be set to run as is with our model. If you wish to add custom
+prompts to the file, do so by creating a new codeblock and using the syntax
+```
+prompt = "Your question"
+```
+We also provided two different privacy policies for refrence. The first is
+from [TopHive](https://tophive.ai/privacy-policy) and the second is from
+[Starbucks](https://www.starbucks.com/terms/privacy-policy/). The starbucks one
+does not work as it is too big and you run out of GPU ram fast on the free
+colab plus Llama doesn't like how many words are in it.
+TopHive does work however. To use these privacy policies in your prompt change
+the policy variable to the name of the company you are using the policy of
+```
+policy = starbucks
+```
+Then run one of the question boxes or make your own prompt.
+To run the text generation chose a prompt first by running that box then run
+```
+result = pipe(f"<s>[INST] {prompt} [/INST]")
+print('\n',result[0]['generated_text'])
+resultList.append(result[0]['generated_text'])
+```
+Other than that basically run the code from start to bottom until you get to
+the prompt section. All prompts are saved in the "resultList" variable.