File size: 2,183 Bytes
5e7d47e 9788b08 5e7d47e 67fc303 5e7d47e 8d41324 5e7d47e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
---
license: mit
datasets:
- HuggingFaceFW/fineweb-edu
language:
- en
library_name: transformers
---
30,142,848 trainable parameters.
Embedding parameters: 19,298,688
Non-embedding parameters: 10,844,160
Tokenizer: GPT-2
Vocabulary size: 50,257
Compute: single T4 GPU
Total train time: 2 hours and 40 minutes
Total train tokens: 136,000,000
Epochs: 2
Final train Loss: 2.9811
Final test Loss: 2.7963
_________________________________________
try the following script for inference:
!pip install huggingface_hub
!pip install transformers
!pip install torch
from transformers import GPT2Tokenizer, GPT2Config, GPT2LMHeadModel
from huggingface_hub import hf_hub_download
import torch
# Name
model_name = 'Mizule/Dense-30M'
# Authenticate
token = input("Enter your Hugging Face token: ")
# Download
model_file = hf_hub_download(repo_id=f"{model_name}", filename="Dense-30M.pth", use_auth_token=token)
# Custom config
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
config = GPT2Config(
vocab_size=tokenizer.vocab_size,
n_positions=512,
n_ctx=512,
n_embd=384,
n_layer=6,
n_head=8
)
# Load model
model = GPT2LMHeadModel(config)
model.load_state_dict(torch.load(model_file, map_location=torch.device('cpu')))
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
model.eval()
# Inference settings
def generate_text(prompt, max_length=128, temperature=0.2, top_k=50, top_p=0.9):
inputs = tokenizer(prompt, return_tensors="pt")
inputs = {key: value.to("cuda" if torch.cuda.is_available() else "cpu") for key, value in inputs.items()}
outputs = model.generate(
**inputs,
max_length=max_length,
temperature=temperature,
top_k=top_k,
top_p=top_p,
num_return_sequences=1,
no_repeat_ngram_size=2,
do_sample=True,
early_stopping=True
)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
# Interactive loop (it's an undertrained base model, don't expect it to chat)
while True:
prompt = input("Prompt: ")
if prompt.lower() == 'exit':
break
output = generate_text(prompt)
print(f"Generated text: {output}")
|