TomData commited on
Commit
5798fbc
·
verified ·
1 Parent(s): 29197a8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +82 -1
README.md CHANGED
@@ -6,4 +6,85 @@ language:
6
  library_name: pytorch
7
  pipeline_tag: text-generation
8
  base_model: openai-community/gpt2-medium
9
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  library_name: pytorch
7
  pipeline_tag: text-generation
8
  base_model: openai-community/gpt2-medium
9
+ ---
10
+
11
+ ---
12
+ datasets:
13
+ - McAuley-Lab/Amazon-Reviews-2023
14
+ language:
15
+ - en
16
+ library_name: pytorch
17
+ pipeline_tag: text-generation
18
+ base_model: openai-community/gpt2-medium
19
+ ---
20
+
21
+
22
+ # GPT-2 Medium - Review
23
+
24
+ ## Model Details
25
+
26
+ **Model Description:** GPT-2 Medium is the **355M parameter** version of GPT-2, a transformer-based language model created and released by OpenAI. The model is a further pretrained model on a causal language modeling (CLM) objective with English Amazon Product Reviews from the Fashion category.
27
+
28
+ - **Developed by:** Stundets at University of Konstanz
29
+ - **Model Type:** Transformer-based language model
30
+ - **Language(s):** English
31
+ - **Base Model:** [GPT2-medium](https://huggingface.co/openai-community/gpt2-medium)
32
+ - **Resources for more information:**
33
+ - [GitHub Repo](https://github.com/valentin-velev29/DLSS-24-GPT-2-Project)
34
+
35
+
36
+ ## How to Get Started with the Model
37
+
38
+ Use the code below to get started with the model. You can use this model directly with a pipeline for text generation. Since the generation relies on some randomness, we
39
+ set a seed for reproducibility:
40
+
41
+ ```python
42
+ >>> from transformers import pipeline, set_seed
43
+ >>> generator = pipeline('text-generation', model='gpt2-medium')
44
+ >>> set_seed(42)
45
+ >>> generator("Hello, I'm a language model,", max_length=30, num_return_sequences=5)
46
+
47
+
48
+
49
+ Here is how to use this model to get the features of a given text in PyTorch:
50
+
51
+ ```python
52
+ tokenizer = AutoTokenizer.from_pretrained("TomData/GPT2-review")
53
+ model = AutoModelForCausalLM.from_pretrained("TomData/GPT2-review")
54
+ text = "Replace me by any text you'd like."
55
+ encoded_input = tokenizer(text, return_tensors='pt')
56
+ output = model(**encoded_input)
57
+ ```
58
+
59
+ and in TensorFlow:
60
+
61
+ ```python
62
+ tokenizer = AutoTokenizer.from_pretrained("TomData/GPT2-review")
63
+ model = AutoModelForCausalLM.from_pretrained("TomData/GPT2-review")
64
+ text = "Replace me by any text you'd like."
65
+ encoded_input = tokenizer(text, return_tensors='tf')
66
+ output = model(encoded_input)
67
+ ```
68
+
69
+ ## Uses
70
+
71
+ This model is further pretrained to generate artificial product reviews. This can be usefull for:
72
+ > Market research
73
+ > Product analysis
74
+ > Customer preferences
75
+ > Fashion trends
76
+ > Research
77
+
78
+
79
+ ## Training
80
+
81
+
82
+ The model is further pretrained on the [Amazion Review Dataset](https://huggingface.co/datasets/McAuley-Lab/Amazon-Reviews-2023) from McAuley-Lab.
83
+ For training only the reviews related to the Amazon Fashion category are used. See:
84
+
85
+ ```python
86
+ dataset = load_dataset("McAuley-Lab/Amazon-Reviews-2023", "raw_review_Amazon_Fashion", trust_remote_code=True)
87
+ ```
88
+
89
+
90
+