File size: 2,577 Bytes
ff2efd6
 
 
 
 
29197a8
ff2efd6
6f1fcdc
5798fbc
 
 
 
 
 
2b6fa91
5798fbc
383a65a
5798fbc
 
 
76afde0
5798fbc
 
 
 
 
 
 
 
 
d173e61
5798fbc
 
d173e61
5798fbc
 
 
 
 
 
 
 
 
 
 
 
d173e61
5798fbc
 
 
 
 
 
 
 
 
 
 
 
 
d173e61
 
 
 
 
5798fbc
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
---
datasets:
- McAuley-Lab/Amazon-Reviews-2023
language:
- en
library_name: pytorch
pipeline_tag: text-generation
base_model: openai-community/gpt2-medium
---

# GPT-2 Medium - Review

## Model Details

**Model Description:** This model is a checkpoint of GPT-2 Medium the **355M parameter** version of GPT-2, a transformer-based language model created and released by OpenAI. The model is a further pretrained model on a causal language modeling (CLM) objective with English Amazon Product Reviews from the Fashion category.

- **Developed by:** Students at University of Konstanz
- **Model Type:** Transformer-based language model
- **Language(s):** English
- **Base Model:** [GPT2-medium](https://huggingface.co/openai-community/gpt2-medium)
- **Resources for more information:** [GitHub Repo](https://github.com/TomSOWI/DLSS-24-Synthetic-Product-Reviews-Generation)


## How to Get Started with the Model 

Use the code below to get started with the model. You can use this model directly with a pipeline for text generation. Since the generation relies on some randomness, we
set a seed for reproducibility:

```python
>>> from transformers import pipeline, set_seed
>>> generator = pipeline('text-generation', model='TomData/GPT2-review')
>>> set_seed(42)
>>> generator("Hello, I'm a language model,", max_length=30, num_return_sequences=5)
```


Here is how to use this model to get the features of a given text in PyTorch:

```python
tokenizer = AutoTokenizer.from_pretrained("TomData/GPT2-review")
model = AutoModelForCausalLM.from_pretrained("TomData/GPT2-review")
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)
```


and in TensorFlow:

```python
tokenizer = AutoTokenizer.from_pretrained("TomData/GPT2-review")
model = AutoModelForCausalLM.from_pretrained("TomData/GPT2-review")
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='tf')
output = model(encoded_input)
```

## Uses

This model is further pretrained to generate artificial product reviews. This can be usefull for:
- Market research
- Product analysis
- Customer preferences
- Fashion trends
- Research


## Training


The model is further pretrained on the [Amazion Review Dataset](https://huggingface.co/datasets/McAuley-Lab/Amazon-Reviews-2023) from McAuley-Lab.
For training only the reviews related to the Amazon Fashion category are used. See:

```python
dataset = load_dataset("McAuley-Lab/Amazon-Reviews-2023", "raw_review_Amazon_Fashion", trust_remote_code=True)
```