Tokenizer

We trained our tokenizer using sentencepiece's unigram tokenizer. Then loaded the tokenizer as MT5TokenizerFast.

Model

We used MT5-base model.

Datasets

We used Code Search Net's dataset and some scrapped data from internet to train the model. We maintained a list of datasets where each dataset had codes of same language.

Plots

Train loss

Evaluation loss

Evaluation accuracy

Learning rate

Fine tuning

We fine tuned the model with CodeXGLUE code-to-code-trans dataset, and scrapper data.