Tokenizer
We trained our tokenizer using sentencepiece's unigram tokenizer. Then loaded the tokenizer as MT5TokenizerFast.
Model
We used MT5-base model.
Datasets
We used Code Search Net's dataset and some scrapped data from internet to train the model. We maintained a list of datasets where each dataset had codes of same language.
Plots
Train loss
Evaluation loss
Evaluation accuracy
Learning rate
Fine tuning
We fine tuned the model with CodeXGLUE code-to-code-trans dataset, and scrapper data.