temporary0-0name commited on
Commit
26cd2a7
·
verified ·
1 Parent(s): 0a36318

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -0
README.md CHANGED
@@ -30,6 +30,24 @@ This model, designed and pretrained from scratch, was developed without utilizin
30
  - **Micro Batch Size**: `128`
31
  - **Sequence Length**: `256`
32
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  ## Dataset Description
34
 
35
  ### Overview
 
30
  - **Micro Batch Size**: `128`
31
  - **Sequence Length**: `256`
32
 
33
+ ## Model Parameters Details
34
+
35
+ ### Decayed Parameters
36
+
37
+ - **Total Decayed Parameters**: 95,453,184
38
+
39
+ Decayed parameters typically include weights from the model's various layers (like the transformer blocks), which are subject to weight decay during optimization. This technique helps in regularizing the model, potentially reducing overfitting by penalizing large weights.
40
+
41
+ ### Non-Decayed Parameters
42
+ - **Total Non-Decayed Parameters**: 81,408
43
+
44
+ Non-decayed parameters generally involve biases and layer normalization parameters. These parameters are excluded from weight decay as applying decay can adversely affect the training process by destabilizing the learning dynamics.
45
+
46
+ ### Total Parameters
47
+ - **Overall Total Parameters**: 95,534,592
48
+
49
+ The calculated total number of parameters includes both decayed and non-decayed tensors, summing up to over 95 million parameters.
50
+
51
  ## Dataset Description
52
 
53
  ### Overview