A decoder only model trained on minipile dataset from huggingface trained using gpt2 like model architecture.The following configuration files are used for training. The training was done on 3 NVIDA A100 GPUs.
batch_size: 4
block_size: 1024
gradient_accumulation_steps: 21
max_iters: 200000
lr_decay_iters: 180000
warmup_iters: 20000 # 10% of max_iters
weight_decay: 0.1
dropout: 0.1
device: 'cuda'
n_layer: 16
n_head: 16
n_embd: 2048
The files pytorch_model.bin contains the final checkpoint for the last iteration. The file "checkpoint_iter60k" contains the intermediate checkpoint at 60k-th iteration.
- Downloads last month
- 1
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model’s pipeline type.