Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,42 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- hi
|
5 |
+
- en
|
6 |
+
metrics:
|
7 |
+
- perplexity
|
8 |
+
widget:
|
9 |
+
- text: >-
|
10 |
+
It is raining and my family
|
11 |
+
example_title: Example 1
|
12 |
+
- text: >-
|
13 |
+
We entered into the forest and
|
14 |
+
example_title: Example 2
|
15 |
+
- text: >-
|
16 |
+
I sat for doing my homework
|
17 |
+
example_title: Example 3
|
18 |
+
---
|
19 |
+
|
20 |
+
# Custom GPT Model
|
21 |
+
|
22 |
+
## Model Description
|
23 |
+
This model, designed and pretrained from scratch, was developed without utilizing the Hugging Face library. It was independently trained on custom datasets, specifically focusing on tailored NLP tasks. The training process involved meticulous data preprocessing and training strategies to enhance its language understanding capabilities.
|
24 |
+
|
25 |
+
## Model Parameters
|
26 |
+
- **Block Size**: `256` (Maximum sequence length)
|
27 |
+
- **Vocab Size**: `50257` (Includes 50,000 BPE merges, 256 byte-level tokens, and 1 special token)
|
28 |
+
- **Number of Layers**: `8`
|
29 |
+
- **Number of Heads**: `8`
|
30 |
+
- **Embedding Dimension**: `768`
|
31 |
+
- **Max Learning Rate**: `0.0006`
|
32 |
+
- **Min Learning Rate**: `0.00006` (10% of max_lr)
|
33 |
+
- **Warmup Steps**: `715`
|
34 |
+
- **Max Steps**: `52000`
|
35 |
+
- **Total Batch Size**: `524288` (Number of tokens per batch)
|
36 |
+
- **Micro Batch Size**: `128`
|
37 |
+
- **Sequence Length**: `256`
|
38 |
+
|
39 |
+
### Tokenization
|
40 |
+
For tokenization, this model uses:
|
41 |
+
```python
|
42 |
+
tokenizer = tiktoken.get_encoding("gpt2")
|