TomData
/

GPT2-review

Text Generation

Model card Files Files and versions Community

TomData commited on Aug 19, 2024

Commit

5798fbc

·

verified ·

1 Parent(s): 29197a8

Update README.md

Files changed (1) hide show

README.md +82 -1

README.md CHANGED Viewed

@@ -6,4 +6,85 @@ language:
 library_name: pytorch
 pipeline_tag: text-generation
 base_model: openai-community/gpt2-medium
----

 library_name: pytorch
 pipeline_tag: text-generation
 base_model: openai-community/gpt2-medium
+---
+---
+datasets:
+- McAuley-Lab/Amazon-Reviews-2023
+language:
+- en
+library_name: pytorch
+pipeline_tag: text-generation
+base_model: openai-community/gpt2-medium
+---
+# GPT-2 Medium - Review
+## Model Details
+**Model Description:** GPT-2 Medium is the **355M parameter** version of GPT-2, a transformer-based language model created and released by OpenAI. The model is a further pretrained model on a causal language modeling (CLM) objective with English Amazon Product Reviews from the Fashion category.
+- **Developed by:** Stundets at University of Konstanz
+- **Model Type:** Transformer-based language model
+- **Language(s):** English
+- **Base Model:** [GPT2-medium](https://huggingface.co/openai-community/gpt2-medium)
+- **Resources for more information:**
+  - [GitHub Repo](https://github.com/valentin-velev29/DLSS-24-GPT-2-Project)
+## How to Get Started with the Model
+Use the code below to get started with the model. You can use this model directly with a pipeline for text generation. Since the generation relies on some randomness, we
+set a seed for reproducibility:
+```python
+>>> from transformers import pipeline, set_seed
+>>> generator = pipeline('text-generation', model='gpt2-medium')
+>>> set_seed(42)
+>>> generator("Hello, I'm a language model,", max_length=30, num_return_sequences=5)
+Here is how to use this model to get the features of a given text in PyTorch:
+```python
+tokenizer = AutoTokenizer.from_pretrained("TomData/GPT2-review")
+model = AutoModelForCausalLM.from_pretrained("TomData/GPT2-review")
+text = "Replace me by any text you'd like."
+encoded_input = tokenizer(text, return_tensors='pt')
+output = model(**encoded_input)
+```
+and in TensorFlow:
+```python
+tokenizer = AutoTokenizer.from_pretrained("TomData/GPT2-review")
+model = AutoModelForCausalLM.from_pretrained("TomData/GPT2-review")
+text = "Replace me by any text you'd like."
+encoded_input = tokenizer(text, return_tensors='tf')
+output = model(encoded_input)
+```
+## Uses
+This model is further pretrained to generate artificial product reviews. This can be usefull for:
+> Market research
+> Product analysis
+> Customer preferences
+> Fashion trends
+> Research
+## Training
+The model is further pretrained on the [Amazion Review Dataset](https://huggingface.co/datasets/McAuley-Lab/Amazon-Reviews-2023) from McAuley-Lab.
+For training only the reviews related to the Amazon Fashion category are used. See:
+```python
+dataset = load_dataset("McAuley-Lab/Amazon-Reviews-2023", "raw_review_Amazon_Fashion", trust_remote_code=True)
+```