LKarlo's picture
training roberta structure with 4786611 samples, 24054 test samples, 20 vocab size, 3 hidden layers, 256 hidden size, 4 attention heads, 0.15 mlm probability, 10 num process, 512 max length, 0.005 train test split, 50 min sub seq length, 2000 max sub seq length, 42 seed
055353e