temporary0-0name
/

orator

Model card Files Files and versions Community

temporary0-0name commited on Aug 8, 2024

Commit

26cd2a7

·

verified ·

1 Parent(s): 0a36318

Update README.md

Files changed (1) hide show

README.md +18 -0

README.md CHANGED Viewed

@@ -30,6 +30,24 @@ This model, designed and pretrained from scratch, was developed without utilizin
 - **Micro Batch Size**: `128`
 - **Sequence Length**: `256`
 ## Dataset Description
 ### Overview

 - **Micro Batch Size**: `128`
 - **Sequence Length**: `256`
+## Model Parameters Details
+### Decayed Parameters
+- **Total Decayed Parameters**: 95,453,184
+Decayed parameters typically include weights from the model's various layers (like the transformer blocks), which are subject to weight decay during optimization. This technique helps in regularizing the model, potentially reducing overfitting by penalizing large weights.
+### Non-Decayed Parameters
+- **Total Non-Decayed Parameters**: 81,408
+Non-decayed parameters generally involve biases and layer normalization parameters. These parameters are excluded from weight decay as applying decay can adversely affect the training process by destabilizing the learning dynamics.
+### Total Parameters
+- **Overall Total Parameters**: 95,534,592
+The calculated total number of parameters includes both decayed and non-decayed tensors, summing up to over 95 million parameters.
 ## Dataset Description
 ### Overview