Model Name SmolLM2-135M

Model Description

  • SmolLM2-135M is a 135M parameter model based on the Llama 3 architecture.
  • It is trained on the Cosmopedia-2 dataset.
  • Purpose of this model is to train SmolLm2 Transformer model from scratch, I trained for 10 hours using g5.2xlarge instance (24 A10 single GPU)
  • trained steps 70000 (Batch config : Batch size 16, with 1024 context length)

Base Tokenizer

Cosmo2-tokenizer

Usage Example

import torch
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download
from SmolLm3 import LlamaModel
import yaml
# Download the model file
model_path = hf_hub_download(
    repo_id="crpatel/SmolLM2-135M-cosmopedia2-70kSteps",
    filename="model.pt"
)

config = yaml.load(open('config_smollm2_135M.yaml', "r"), Loader=yaml.FullLoader)
model = LlamaModel(config['model'])
model.load_state_dict(torch.load(model_path, map_location='cpu'))
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/cosmo2-tokenizer")
# cpu = torch.device('cpu')
encoded_text = tokenizer.encode('Once Upon time ', return_tensors="pt").to('cpu')
print(encoded_text)
generated_text2=model.generate(idx=encoded_text, max_new_tokens=100, context_length=50, 
                               temperature=0.9,
                                 top_k=2, eos_token=tokenizer.eos_token_id, 
                                 device='cpu')
decoded_text2=tokenizer.decode(generated_text2.squeeze(0))
print(decoded_text2)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for crpatel/SmolLM2-135M-cosmopedia2-70kSteps

Finetuned
(437)
this model

Dataset used to train crpatel/SmolLM2-135M-cosmopedia2-70kSteps