datives_removed_seed-42_1e-3

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 32
eval_batch_size: 64
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 256
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 32000
num_epochs: 20.0
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Accuracy
6.0149	0.9995	1504	4.4077	0.2931
3.947	1.9993	3008	3.8892	0.3339
3.6966	2.9992	4512	3.6094	0.3572
3.4034	3.9997	6017	3.4486	0.3726
3.2971	4.9995	7521	3.3558	0.3813
3.1805	5.9993	9025	3.2927	0.3872
3.1139	6.9992	10529	3.2557	0.3909
3.0634	7.9997	12034	3.2280	0.3938
3.0111	8.9995	13538	3.2141	0.3953
2.9921	9.9993	15042	3.1998	0.3969
2.9476	10.9992	16546	3.1914	0.3984
2.9475	11.9997	18051	3.1887	0.3987
2.9023	12.9995	19555	3.1828	0.3996
2.9125	13.9993	21059	3.1795	0.4003
2.8721	14.9992	22563	3.1771	0.4005
2.8886	15.9997	24068	3.1767	0.4010
2.8514	16.9995	25572	3.1728	0.4009
2.8731	17.9993	27076	3.1758	0.4015
2.8378	18.9992	28580	3.1706	0.4013
2.8617	19.9963	30080	3.1700	0.4016