mbien
/

gpt-neo-pl-125m

Text Generation

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

gpt-neo-pl-125m / README.md

mbien's picture

Draft uploaded

e2270ad over 1 year ago

|

history blame contribute delete

4.29 kB

	---
	language: pl
	tags:
	- generated_from_trainer
	- text-generation
	widget:
	- text: "Bolesław Leśmian - polski poeta"
	datasets:
	- wikipedia
	metrics:
	- accuracy
	model-index:
	- name: gpt_neo_pl_125M
	results:
	- task:
	name: Causal Language Modeling
	type: text-generation
	dataset:
	name: wikipedia 20220720.pl
	type: wikipedia
	args: 20220720.pl
	metrics:
	- name: Accuracy
	type: accuracy
	value: 0.4312838299951148
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# gpt_neo_pl_125M_v2

	This model was trained from scratch on the wikipedia 20220720.pl dataset.
	It achieves the following results on the evaluation set:
	- Loss: 3.3862
	- Accuracy: 0.4313

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0002
	- train_batch_size: 1
	- eval_batch_size: 2
	- seed: 42
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 8
	- optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 1000
	- num_epochs: 1.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|:--------:\|
	\| 5.9469 \| 0.02 \| 1000 \| 6.5843 \| 0.1435 \|
	\| 4.9953 \| 0.05 \| 2000 \| 5.7709 \| 0.1911 \|
	\| 4.3754 \| 0.07 \| 3000 \| 5.2624 \| 0.2331 \|
	\| 3.9795 \| 0.1 \| 4000 \| 4.8752 \| 0.2731 \|
	\| 3.7099 \| 0.12 \| 5000 \| 4.5927 \| 0.3039 \|
	\| 3.4747 \| 0.15 \| 6000 \| 4.3942 \| 0.3230 \|
	\| 3.343 \| 0.17 \| 7000 \| 4.2879 \| 0.3349 \|
	\| 3.2767 \| 0.2 \| 8000 \| 4.1698 \| 0.3459 \|
	\| 3.1852 \| 0.22 \| 9000 \| 4.0925 \| 0.3534 \|
	\| 3.0871 \| 0.25 \| 10000 \| 4.0239 \| 0.3608 \|
	\| 3.0746 \| 0.27 \| 11000 \| 3.9646 \| 0.3664 \|
	\| 2.9473 \| 0.3 \| 12000 \| 3.9245 \| 0.3706 \|
	\| 2.9737 \| 0.32 \| 13000 \| 3.8742 \| 0.3754 \|
	\| 2.9193 \| 0.35 \| 14000 \| 3.8285 \| 0.3796 \|
	\| 2.8833 \| 0.37 \| 15000 \| 3.7952 \| 0.3837 \|
	\| 2.8533 \| 0.4 \| 16000 \| 3.7616 \| 0.3873 \|
	\| 2.8654 \| 0.42 \| 17000 \| 3.7296 \| 0.3907 \|
	\| 2.8196 \| 0.44 \| 18000 \| 3.7049 \| 0.3936 \|
	\| 2.7883 \| 0.47 \| 19000 \| 3.6786 \| 0.3966 \|
	\| 2.747 \| 0.49 \| 20000 \| 3.6488 \| 0.3990 \|
	\| 2.7355 \| 0.52 \| 21000 \| 3.6243 \| 0.4021 \|
	\| 2.7355 \| 0.54 \| 22000 \| 3.5982 \| 0.4053 \|
	\| 2.6999 \| 0.57 \| 23000 \| 3.5765 \| 0.4075 \|
	\| 2.7243 \| 0.59 \| 24000 \| 3.5558 \| 0.4101 \|
	\| 2.6526 \| 0.62 \| 25000 \| 3.5371 \| 0.4125 \|
	\| 2.641 \| 0.64 \| 26000 \| 3.5150 \| 0.4146 \|
	\| 2.6602 \| 0.67 \| 27000 \| 3.4971 \| 0.4168 \|
	\| 2.644 \| 0.69 \| 28000 \| 3.4812 \| 0.4192 \|
	\| 2.6558 \| 0.72 \| 29000 \| 3.4622 \| 0.4215 \|
	\| 2.5664 \| 0.74 \| 30000 \| 3.4504 \| 0.4229 \|
	\| 2.5669 \| 0.77 \| 31000 \| 3.4376 \| 0.4245 \|
	\| 2.5498 \| 0.79 \| 32000 \| 3.4263 \| 0.4263 \|
	\| 2.5874 \| 0.82 \| 33000 \| 3.4169 \| 0.4274 \|
	\| 2.5555 \| 0.84 \| 34000 \| 3.4067 \| 0.4286 \|
	\| 2.5502 \| 0.86 \| 35000 \| 3.3997 \| 0.4298 \|
	\| 2.5232 \| 0.89 \| 36000 \| 3.3946 \| 0.4302 \|
	\| 2.5369 \| 0.91 \| 37000 \| 3.3898 \| 0.4309 \|
	\| 2.5335 \| 0.94 \| 38000 \| 3.3869 \| 0.4313 \|
	\| 2.6032 \| 0.96 \| 39000 \| 3.3853 \| 0.4315 \|
	\| 2.5244 \| 0.99 \| 40000 \| 3.3850 \| 0.4314 \|


	### Framework versions

	- Transformers 4.22.0.dev0
	- Pytorch 1.12.0
	- Datasets 2.4.0
	- Tokenizers 0.12.1