mstftmk
/

shakespeare-gpt2

Model card Files Files and versions Community

mstftmk commited on Jan 8

Commit

5d2d04b

·

verified ·

1 Parent(s): 26de4f4

Create README.md

Files changed (1) hide show

README.md +84 -0

README.md ADDED Viewed

	@@ -0,0 +1,84 @@

+---
+license: apache-2.0
+base_model:
+- openai-community/gpt2
+---
+# Shakespeare Fine-Tuned GPT-2 Model
+## Model Description
+This is a fine-tuned version of the GPT-2 language model trained on the [Tiny Shakespeare dataset](https://github.com/karpathy/char-rnn/blob/master/data/tinyshakespeare/input.txt). The model is optimized to generate text in the style of William Shakespeare, capturing the syntax, vocabulary, and poetic structure characteristic of his works.
+## Intended Use
+The model is designed for educational purposes, creative writing, and experimentation with fine-tuned language models. Potential use cases include:
+- Generating Shakespearean-style text for creative projects.
+- Studying language modeling and fine-tuning techniques.
+- Providing inspiration for poetry or prose in Shakespearean English.
+### Usage
+You can use this model via the Hugging Face Transformers library. Below is an example:
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+# Load model and tokenizer
+model_name = "msttftmk/shakespeare-gpt2"
+tokenizer = AutoTokenizer.from_pretrained(model_name, use_auth_token=True)
+model = AutoModelForCausalLM.from_pretrained(model_name, use_auth_token=True)
+# Generate text
+input_text = "O gentle fair maiden,"
+inputs = tokenizer.encode(input_text, return_tensors="pt")
+outputs = model.generate(inputs, max_length=100, temperature=0.7)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+---
+## Training Details
+- **Base Model**: [GPT-2 (medium)](https://huggingface.co/gpt2-medium)
+- **Dataset**: Tiny Shakespeare dataset.
+- **Fine-Tuning Framework**: Hugging Face's `Trainer` API.
+- **Training Parameters**:
+  - Learning rate: `2e-5`
+  - Epochs: `3`
+  - Batch size: `2`
+  - Max sequence length: `128`
+---
+## Evaluation
+- **Validation Split**: 10% of the dataset.
+- **Evaluation Strategy**: Per epoch evaluation during training.
+- **Metrics**: Loss and perplexity on validation data.
+---
+## Limitations
+- **Style-Restricted**: The model generates text exclusively in a Shakespearean style. It is not intended for modern conversational or general-purpose language modeling.
+- **Biases**: The model inherits any biases present in the training dataset.
+- **Dataset Limitations**: The Tiny Shakespeare dataset is limited in size and scope, potentially restricting the richness and variability of the generated text.
+---
+## Ethical Considerations
+- The model should not be used for generating harmful, offensive, or misleading content.
+- Users should ensure proper attribution when using this model for creative projects.
+---
+## Citation
+If you use this model, please cite:
+```
+@misc{shakespeare-gpt2,
+  author = {Mustafa Tomak},
+  title = {Shakespeare Fine-Tuned GPT-2},
+  year = {2025},
+  url = {https://huggingface.co/mstftmk/shakespeare-gpt2},
+}
+```
+---
+## License
+The model is released under the apache-2.0. Users must comply with the terms of use.