mstftmk commited on
Commit
5d2d04b
·
verified ·
1 Parent(s): 26de4f4

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -0
README.md ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - openai-community/gpt2
5
+ ---
6
+ # Shakespeare Fine-Tuned GPT-2 Model
7
+
8
+ ## Model Description
9
+ This is a fine-tuned version of the GPT-2 language model trained on the [Tiny Shakespeare dataset](https://github.com/karpathy/char-rnn/blob/master/data/tinyshakespeare/input.txt). The model is optimized to generate text in the style of William Shakespeare, capturing the syntax, vocabulary, and poetic structure characteristic of his works.
10
+
11
+ ## Intended Use
12
+ The model is designed for educational purposes, creative writing, and experimentation with fine-tuned language models. Potential use cases include:
13
+ - Generating Shakespearean-style text for creative projects.
14
+ - Studying language modeling and fine-tuning techniques.
15
+ - Providing inspiration for poetry or prose in Shakespearean English.
16
+
17
+ ### Usage
18
+ You can use this model via the Hugging Face Transformers library. Below is an example:
19
+
20
+ ```python
21
+ from transformers import AutoTokenizer, AutoModelForCausalLM
22
+
23
+ # Load model and tokenizer
24
+ model_name = "msttftmk/shakespeare-gpt2"
25
+ tokenizer = AutoTokenizer.from_pretrained(model_name, use_auth_token=True)
26
+ model = AutoModelForCausalLM.from_pretrained(model_name, use_auth_token=True)
27
+
28
+ # Generate text
29
+ input_text = "O gentle fair maiden,"
30
+ inputs = tokenizer.encode(input_text, return_tensors="pt")
31
+ outputs = model.generate(inputs, max_length=100, temperature=0.7)
32
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
33
+ ```
34
+
35
+ ---
36
+
37
+ ## Training Details
38
+ - **Base Model**: [GPT-2 (medium)](https://huggingface.co/gpt2-medium)
39
+ - **Dataset**: Tiny Shakespeare dataset.
40
+ - **Fine-Tuning Framework**: Hugging Face's `Trainer` API.
41
+ - **Training Parameters**:
42
+ - Learning rate: `2e-5`
43
+ - Epochs: `3`
44
+ - Batch size: `2`
45
+ - Max sequence length: `128`
46
+
47
+ ---
48
+
49
+ ## Evaluation
50
+ - **Validation Split**: 10% of the dataset.
51
+ - **Evaluation Strategy**: Per epoch evaluation during training.
52
+ - **Metrics**: Loss and perplexity on validation data.
53
+
54
+ ---
55
+
56
+ ## Limitations
57
+ - **Style-Restricted**: The model generates text exclusively in a Shakespearean style. It is not intended for modern conversational or general-purpose language modeling.
58
+ - **Biases**: The model inherits any biases present in the training dataset.
59
+ - **Dataset Limitations**: The Tiny Shakespeare dataset is limited in size and scope, potentially restricting the richness and variability of the generated text.
60
+
61
+ ---
62
+
63
+ ## Ethical Considerations
64
+ - The model should not be used for generating harmful, offensive, or misleading content.
65
+ - Users should ensure proper attribution when using this model for creative projects.
66
+
67
+ ---
68
+
69
+ ## Citation
70
+ If you use this model, please cite:
71
+ ```
72
+ @misc{shakespeare-gpt2,
73
+ author = {Mustafa Tomak},
74
+ title = {Shakespeare Fine-Tuned GPT-2},
75
+ year = {2025},
76
+ url = {https://huggingface.co/mstftmk/shakespeare-gpt2},
77
+ }
78
+ ```
79
+
80
+ ---
81
+
82
+ ## License
83
+ The model is released under the apache-2.0. Users must comply with the terms of use.
84
+