llm3br256

This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on the centime dataset. It achieves the following results on the evaluation set:

Loss: 0.0070

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 48
eval_batch_size: 48
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 25.0

Training results

Training Loss	Epoch	Step	Validation Loss
0.1159	0.1208	25	0.1004
0.0843	0.2415	50	0.0635
0.0763	0.3623	75	0.0474
0.0496	0.4831	100	0.0365
0.046	0.6039	125	0.0316
0.0368	0.7246	150	0.0266
0.0283	0.8454	175	0.0232
0.0237	0.9662	200	0.0212
0.0234	1.0870	225	0.0194
0.0232	1.2077	250	0.0176
0.0307	1.3285	275	0.0178
0.0228	1.4493	300	0.0147
0.0167	1.5700	325	0.0155
0.0238	1.6908	350	0.0125
0.0191	1.8116	375	0.0138
0.0273	1.9324	400	0.0120
0.0194	2.0531	425	0.0125
0.0125	2.1739	450	0.0128
0.0132	2.2947	475	0.0117
0.0142	2.4155	500	0.0099
0.0119	2.5362	525	0.0105
0.0131	2.6570	550	0.0118
0.0089	2.7778	575	0.0100
0.0158	2.8986	600	0.0096
0.0119	3.0193	625	0.0096
0.0097	3.1401	650	0.0099
0.0089	3.2609	675	0.0092
0.0087	3.3816	700	0.0088
0.0083	3.5024	725	0.0088
0.0088	3.6232	750	0.0080
0.0058	3.7440	775	0.0069
0.008	3.8647	800	0.0070
0.0099	3.9855	825	0.0073
0.0072	4.1063	850	0.0113
0.0065	4.2271	875	0.0107
0.0079	4.3478	900	0.0097
0.0081	4.4686	925	0.0103

Framework versions

PEFT 0.12.0
Transformers 4.46.1
Pytorch 2.4.0+cu121
Datasets 3.1.0
Tokenizers 0.20.3

sizhkhy
/

centime

llm3br256

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for sizhkhy/centime

Evaluation results