Training parameters: ``` model_args = ClassificationArgs() model_args.max_seq_length = 512 model_args.train_batch_size = 12 model_args.eval_batch_size = 12 model_args.num_train_epochs = 5 model_args.evaluate_during_training = False model_args.learning_rate = 1e-5 model_args.use_multiprocessing = False model_args.fp16 = False model_args.save_steps = -1 model_args.save_eval_checkpoints = False model_args.no_cache = True model_args.reprocess_input_data = True model_args.overwrite_output_dir = True ``` Evaluation on BoolQ Test Set: | | Precision | Recall | F1-score | |:------------:|:---------:|:------:|:--------:| | 0 | 0.82 | 0.80 | 0.81 | | 1 | 0.88 | 0.89 | 0.88 | | accuracy | | | 0.86 | | macro avg | 0.85 | 0.84 | 0.85 | | weighted avg | 0.86 | 0.86 | 0.86 | ROC AUC Score: 0.844