ch1 / README.md
s1lv3rj1nx's picture
Update README.md
b3d0ec2 verified
metadata
license: apache-2.0
datasets:
  - ai4bharat/samanantar
language:
  - en
tags:
  - translation

This is the trained model file for Ch1 - Attention is all you need. This chapter creates a transformer from scratch for English to Hindi translation. Please use any of the checkpoints for inference. Loss Graph: image.png

Training specs: Trained on Nvidia A10 GPU (24G) for 12hrs.

return {
'batch_size': 85,
'num_samples': 1000000,
'num_epochs': 10,
'lr': 10**-4,
'seq_len': 128,
'd_model': 512,
'datasource': "runs",
'tgt_language': 'hi',
'model_folder': 'weights',
'model_basename': 'tmodel_',
'preload': None,
'tokenizer_folder': 'tokenizer',
'vocab_size': 52000,
}