temporary0-0name commited on
Commit
ffd4d10
·
verified ·
1 Parent(s): 26cd2a7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -0
README.md CHANGED
@@ -16,6 +16,8 @@ widget:
16
  ## Model Description
17
  This model, designed and pretrained from scratch, was developed without utilizing the Hugging Face library.
18
 
 
 
19
  ## Model Parameters
20
  - **Block Size**: `256` (Maximum sequence length)
21
  - **Vocab Size**: `50257` (Includes 50,000 BPE merges, 256 byte-level tokens, and 1 special token)
@@ -48,6 +50,8 @@ Non-decayed parameters generally involve biases and layer normalization paramete
48
 
49
  The calculated total number of parameters includes both decayed and non-decayed tensors, summing up to over 95 million parameters.
50
 
 
 
51
  ## Dataset Description
52
 
53
  ### Overview
@@ -61,7 +65,32 @@ The dataset is hosted and maintained on Hugging Face's dataset repository. More
61
  - **Total Tokens Used for Training**: 3 billion tokens
62
  - **Training Duration**: The model was trained over 3 epochs to ensure sufficient exposure to the data while optimizing the learning trajectory.
63
 
 
 
 
 
 
 
64
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
65
 
66
  ### Tokenization
67
  For tokenization, this model uses:
@@ -69,6 +98,8 @@ For tokenization, this model uses:
69
  tokenizer = tiktoken.get_encoding("gpt2")
70
  ```
71
 
 
 
72
  ## How to Use the Model
73
 
74
  ### Load and Generate Text
 
16
  ## Model Description
17
  This model, designed and pretrained from scratch, was developed without utilizing the Hugging Face library.
18
 
19
+ ---
20
+
21
  ## Model Parameters
22
  - **Block Size**: `256` (Maximum sequence length)
23
  - **Vocab Size**: `50257` (Includes 50,000 BPE merges, 256 byte-level tokens, and 1 special token)
 
50
 
51
  The calculated total number of parameters includes both decayed and non-decayed tensors, summing up to over 95 million parameters.
52
 
53
+ ---
54
+
55
  ## Dataset Description
56
 
57
  ### Overview
 
65
  - **Total Tokens Used for Training**: 3 billion tokens
66
  - **Training Duration**: The model was trained over 3 epochs to ensure sufficient exposure to the data while optimizing the learning trajectory.
67
 
68
+ ---
69
+
70
+ ## Model Evaluation on HellaSwag Dataset
71
+
72
+ ### Performance Overview
73
+ The evaluation of our model, "orator," on the HellaSwag dataset demonstrates significant progress in understanding context-based predictions. Below, we detail the performance through loss and accuracy graphs, accompanied by specific metrics.
74
 
75
+ ### Graph Analysis
76
+
77
+ #### Loss Graph
78
+ ![Loss Graph](output1.png)
79
+ - **Blue Line (Train Loss)**: Represents the model's loss on the training set over the number of training steps. It shows a sharp decline initially, indicating rapid learning, followed by fluctuations that gradually stabilize.
80
+ - **Orange Line (Validation Loss)**: Shows the loss on the validation set. This line is smoother than the training loss, indicating general stability and effectiveness of the model against unseen data.
81
+ - **Red Dashed Line**: Marks the validation loss of a baseline model, OpenAI's GPT-2 (124M), for comparison. Our model achieves lower validation loss, indicating improved performance.
82
+
83
+ #### Accuracy Graph (HellaSwag Eval)
84
+ ![Accuracy Graph](output2.png)
85
+ - **Blue Line**: This line represents the accuracy of the "orator" model on the HellaSwag evaluation set. It shows a steady increase in accuracy, reflecting the model's improving capability to correctly predict or complete new scenarios.
86
+ - **Red Dashed Line**: This is the accuracy of the baseline OpenAI GPT-2 (124M) model. Our model consistently surpasses this benchmark after initial training phases.
87
+
88
+ ### Key Metrics
89
+ - **Minimum Training Loss**: `2.883471`
90
+ - **Minimum Validation Loss**: `3.1989`
91
+ - **Maximum HellaSwag Evaluation Accuracy**: `0.3054`
92
+
93
+ ---
94
 
95
  ### Tokenization
96
  For tokenization, this model uses:
 
98
  tokenizer = tiktoken.get_encoding("gpt2")
99
  ```
100
 
101
+ ---
102
+
103
  ## How to Use the Model
104
 
105
  ### Load and Generate Text