sagorbert_nwp_finetuning_test4 / README.md

update model card README.md

3fd351b over 1 year ago

3.89 kB

	---
	license: mit
	base_model: sagorsarker/bangla-bert-base
	tags:
	- generated_from_trainer
	model-index:
	- name: sagorbert_nwp_finetuning_test4
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# sagorbert_nwp_finetuning_test4

	This model is a fine-tuned version of [sagorsarker/bangla-bert-base](https://huggingface.co/sagorsarker/bangla-bert-base) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 2.8149

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 50

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|
	\| 4.7767 \| 1.0 \| 544 \| 4.2674 \|
	\| 4.1049 \| 2.0 \| 1088 \| 3.8654 \|
	\| 3.7943 \| 3.0 \| 1632 \| 3.6822 \|
	\| 3.5949 \| 4.0 \| 2176 \| 3.6292 \|
	\| 3.4489 \| 5.0 \| 2720 \| 3.4425 \|
	\| 3.2806 \| 6.0 \| 3264 \| 3.4347 \|
	\| 3.1905 \| 7.0 \| 3808 \| 3.3484 \|
	\| 3.1216 \| 8.0 \| 4352 \| 3.2960 \|
	\| 2.9871 \| 9.0 \| 4896 \| 3.2927 \|
	\| 2.9642 \| 10.0 \| 5440 \| 3.2930 \|
	\| 2.8607 \| 11.0 \| 5984 \| 3.2112 \|
	\| 2.7493 \| 12.0 \| 6528 \| 3.1386 \|
	\| 2.7057 \| 13.0 \| 7072 \| 3.1607 \|
	\| 2.6244 \| 14.0 \| 7616 \| 3.1132 \|
	\| 2.6006 \| 15.0 \| 8160 \| 3.0764 \|
	\| 2.521 \| 16.0 \| 8704 \| 3.1419 \|
	\| 2.4752 \| 17.0 \| 9248 \| 3.0641 \|
	\| 2.4493 \| 18.0 \| 9792 \| 2.9287 \|
	\| 2.4133 \| 19.0 \| 10336 \| 3.0460 \|
	\| 2.3448 \| 20.0 \| 10880 \| 3.0339 \|
	\| 2.3252 \| 21.0 \| 11424 \| 2.9302 \|
	\| 2.2843 \| 22.0 \| 11968 \| 2.9520 \|
	\| 2.2266 \| 23.0 \| 12512 \| 2.9751 \|
	\| 2.1527 \| 24.0 \| 13056 \| 2.8732 \|
	\| 2.1661 \| 25.0 \| 13600 \| 2.9094 \|
	\| 2.1001 \| 26.0 \| 14144 \| 2.8885 \|
	\| 2.0863 \| 27.0 \| 14688 \| 2.9079 \|
	\| 2.079 \| 28.0 \| 15232 \| 2.8848 \|
	\| 2.0468 \| 29.0 \| 15776 \| 2.7729 \|
	\| 2.0064 \| 30.0 \| 16320 \| 2.9156 \|
	\| 2.0025 \| 31.0 \| 16864 \| 2.8439 \|
	\| 1.9941 \| 32.0 \| 17408 \| 2.8801 \|
	\| 1.9787 \| 33.0 \| 17952 \| 2.8806 \|
	\| 1.9317 \| 34.0 \| 18496 \| 2.8564 \|
	\| 1.8991 \| 35.0 \| 19040 \| 2.8786 \|
	\| 1.8881 \| 36.0 \| 19584 \| 2.9111 \|
	\| 1.8497 \| 37.0 \| 20128 \| 2.8445 \|
	\| 1.846 \| 38.0 \| 20672 \| 2.7834 \|
	\| 1.8254 \| 39.0 \| 21216 \| 2.8369 \|
	\| 1.8306 \| 40.0 \| 21760 \| 2.8321 \|
	\| 1.8062 \| 41.0 \| 22304 \| 2.8028 \|
	\| 1.7845 \| 42.0 \| 22848 \| 2.8520 \|
	\| 1.7953 \| 43.0 \| 23392 \| 2.7625 \|
	\| 1.7628 \| 44.0 \| 23936 \| 2.8242 \|
	\| 1.7593 \| 45.0 \| 24480 \| 2.8058 \|
	\| 1.7384 \| 46.0 \| 25024 \| 2.8107 \|
	\| 1.7426 \| 47.0 \| 25568 \| 2.8554 \|
	\| 1.7366 \| 48.0 \| 26112 \| 2.7281 \|
	\| 1.7453 \| 49.0 \| 26656 \| 2.7387 \|
	\| 1.7375 \| 50.0 \| 27200 \| 2.7897 \|


	### Framework versions

	- Transformers 4.31.0
	- Pytorch 2.0.1+cu118
	- Datasets 2.14.4
	- Tokenizers 0.13.3

	---
	license: mit
	base_model: sagorsarker/bangla-bert-base
	tags:
	- generated_from_trainer
	model-index:
	- name: sagorbert_nwp_finetuning_test4
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# sagorbert_nwp_finetuning_test4

	This model is a fine-tuned version of [sagorsarker/bangla-bert-base](https://huggingface.co/sagorsarker/bangla-bert-base) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 2.8149

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 50

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|
	\| 4.7767 \| 1.0 \| 544 \| 4.2674 \|
	\| 4.1049 \| 2.0 \| 1088 \| 3.8654 \|
	\| 3.7943 \| 3.0 \| 1632 \| 3.6822 \|
	\| 3.5949 \| 4.0 \| 2176 \| 3.6292 \|
	\| 3.4489 \| 5.0 \| 2720 \| 3.4425 \|
	\| 3.2806 \| 6.0 \| 3264 \| 3.4347 \|
	\| 3.1905 \| 7.0 \| 3808 \| 3.3484 \|
	\| 3.1216 \| 8.0 \| 4352 \| 3.2960 \|
	\| 2.9871 \| 9.0 \| 4896 \| 3.2927 \|
	\| 2.9642 \| 10.0 \| 5440 \| 3.2930 \|
	\| 2.8607 \| 11.0 \| 5984 \| 3.2112 \|
	\| 2.7493 \| 12.0 \| 6528 \| 3.1386 \|
	\| 2.7057 \| 13.0 \| 7072 \| 3.1607 \|
	\| 2.6244 \| 14.0 \| 7616 \| 3.1132 \|
	\| 2.6006 \| 15.0 \| 8160 \| 3.0764 \|
	\| 2.521 \| 16.0 \| 8704 \| 3.1419 \|
	\| 2.4752 \| 17.0 \| 9248 \| 3.0641 \|
	\| 2.4493 \| 18.0 \| 9792 \| 2.9287 \|
	\| 2.4133 \| 19.0 \| 10336 \| 3.0460 \|
	\| 2.3448 \| 20.0 \| 10880 \| 3.0339 \|
	\| 2.3252 \| 21.0 \| 11424 \| 2.9302 \|
	\| 2.2843 \| 22.0 \| 11968 \| 2.9520 \|
	\| 2.2266 \| 23.0 \| 12512 \| 2.9751 \|
	\| 2.1527 \| 24.0 \| 13056 \| 2.8732 \|
	\| 2.1661 \| 25.0 \| 13600 \| 2.9094 \|
	\| 2.1001 \| 26.0 \| 14144 \| 2.8885 \|
	\| 2.0863 \| 27.0 \| 14688 \| 2.9079 \|
	\| 2.079 \| 28.0 \| 15232 \| 2.8848 \|
	\| 2.0468 \| 29.0 \| 15776 \| 2.7729 \|
	\| 2.0064 \| 30.0 \| 16320 \| 2.9156 \|
	\| 2.0025 \| 31.0 \| 16864 \| 2.8439 \|
	\| 1.9941 \| 32.0 \| 17408 \| 2.8801 \|
	\| 1.9787 \| 33.0 \| 17952 \| 2.8806 \|
	\| 1.9317 \| 34.0 \| 18496 \| 2.8564 \|
	\| 1.8991 \| 35.0 \| 19040 \| 2.8786 \|
	\| 1.8881 \| 36.0 \| 19584 \| 2.9111 \|
	\| 1.8497 \| 37.0 \| 20128 \| 2.8445 \|
	\| 1.846 \| 38.0 \| 20672 \| 2.7834 \|
	\| 1.8254 \| 39.0 \| 21216 \| 2.8369 \|
	\| 1.8306 \| 40.0 \| 21760 \| 2.8321 \|
	\| 1.8062 \| 41.0 \| 22304 \| 2.8028 \|
	\| 1.7845 \| 42.0 \| 22848 \| 2.8520 \|
	\| 1.7953 \| 43.0 \| 23392 \| 2.7625 \|
	\| 1.7628 \| 44.0 \| 23936 \| 2.8242 \|
	\| 1.7593 \| 45.0 \| 24480 \| 2.8058 \|
	\| 1.7384 \| 46.0 \| 25024 \| 2.8107 \|
	\| 1.7426 \| 47.0 \| 25568 \| 2.8554 \|
	\| 1.7366 \| 48.0 \| 26112 \| 2.7281 \|
	\| 1.7453 \| 49.0 \| 26656 \| 2.7387 \|
	\| 1.7375 \| 50.0 \| 27200 \| 2.7897 \|


	### Framework versions

	- Transformers 4.31.0
	- Pytorch 2.0.1+cu118
	- Datasets 2.14.4
	- Tokenizers 0.13.3