|
--- |
|
datasets: |
|
- McAuley-Lab/Amazon-Reviews-2023 |
|
language: |
|
- en |
|
library_name: pytorch |
|
pipeline_tag: text-generation |
|
base_model: openai-community/gpt2-medium |
|
--- |
|
|
|
# GPT-2 Medium - Review |
|
|
|
## Model Details |
|
|
|
**Model Description:** This model is a checkpoint of GPT-2 Medium the **355M parameter** version of GPT-2, a transformer-based language model created and released by OpenAI. The model is a further pretrained model on a causal language modeling (CLM) objective with English Amazon Product Reviews from the Fashion category. |
|
|
|
- **Developed by:** Students at University of Konstanz |
|
- **Model Type:** Transformer-based language model |
|
- **Language(s):** English |
|
- **Base Model:** [GPT2-medium](https://huggingface.co/openai-community/gpt2-medium) |
|
- **Resources for more information:** [GitHub Repo](https://github.com/TomSOWI/DLSS-24-Synthetic-Product-Reviews-Generation) |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. You can use this model directly with a pipeline for text generation. Since the generation relies on some randomness, we |
|
set a seed for reproducibility: |
|
|
|
```python |
|
>>> from transformers import pipeline, set_seed |
|
>>> generator = pipeline('text-generation', model='TomData/GPT2-review') |
|
>>> set_seed(42) |
|
>>> generator("Hello, I'm a language model,", max_length=30, num_return_sequences=5) |
|
``` |
|
|
|
|
|
Here is how to use this model to get the features of a given text in PyTorch: |
|
|
|
```python |
|
tokenizer = AutoTokenizer.from_pretrained("TomData/GPT2-review") |
|
model = AutoModelForCausalLM.from_pretrained("TomData/GPT2-review") |
|
text = "Replace me by any text you'd like." |
|
encoded_input = tokenizer(text, return_tensors='pt') |
|
output = model(**encoded_input) |
|
``` |
|
|
|
|
|
and in TensorFlow: |
|
|
|
```python |
|
tokenizer = AutoTokenizer.from_pretrained("TomData/GPT2-review") |
|
model = AutoModelForCausalLM.from_pretrained("TomData/GPT2-review") |
|
text = "Replace me by any text you'd like." |
|
encoded_input = tokenizer(text, return_tensors='tf') |
|
output = model(encoded_input) |
|
``` |
|
|
|
## Uses |
|
|
|
This model is further pretrained to generate artificial product reviews. This can be usefull for: |
|
- Market research |
|
- Product analysis |
|
- Customer preferences |
|
- Fashion trends |
|
- Research |
|
|
|
|
|
## Training |
|
|
|
|
|
The model is further pretrained on the [Amazion Review Dataset](https://huggingface.co/datasets/McAuley-Lab/Amazon-Reviews-2023) from McAuley-Lab. |
|
For training only the reviews related to the Amazon Fashion category are used. See: |
|
|
|
```python |
|
dataset = load_dataset("McAuley-Lab/Amazon-Reviews-2023", "raw_review_Amazon_Fashion", trust_remote_code=True) |
|
``` |
|
|
|
|
|
|
|
|