Duino commited on
Commit
bb38df8
·
verified ·
1 Parent(s): 667837f

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +49 -0
README.md ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # Multilingual GPT Model (Byte-Level)
3
+
4
+ This model is a multilingual GPT model trained on byte-level encodings of Wikipedia articles in Arabic (ar) and Egyptian Arabic (ary).
5
+
6
+ **Model Details:**
7
+ - Trained using a byte-level vocabulary (size: 32000).
8
+ - Architecture: Transformer-based GPT model.
9
+ - Languages: Arabic (ar), Egyptian Arabic (ary).
10
+ - Training Data: Streamed Wikipedia dataset (limited to 10000 articles per language).
11
+ - Training Code: [Link to your training script/GitHub repo if available]
12
+
13
+ **Usage:**
14
+
15
+ [Provide instructions on how to load and use the model. E.g., using `torch.load` and the provided `GPTLanguageModel` class.]
16
+
17
+ **Example (Conceptual - Adapt to your actual loading process):**
18
+
19
+ ```python
20
+ import torch
21
+ from your_model_definition_script import GPTLanguageModel # Assuming you save model definition
22
+
23
+ # Initialize model architecture (must be defined in a separate script)
24
+ model = GPTLanguageModel()
25
+ model.load_state_dict(torch.load('model_weights.pth')) # Load from local if downloaded from HF
26
+ model.eval()
27
+
28
+ # ... (rest of your inference code) ...
29
+ ```
30
+
31
+ **Training Hyperparameters:**
32
+ - Batch Size: 32
33
+ - Block Size: 256
34
+ - Embedding Dimension: 384
35
+ - Number of Heads: 6
36
+ - Number of Layers: 6
37
+ - Dropout: 0.2
38
+ - Optimizer: AdamW
39
+ - Learning Rate: 0.0006
40
+ - Max Iterations: 5000
41
+
42
+ **Loss Curve:**
43
+ [You can optionally add a link or embed the training plot image here]
44
+
45
+ **License:**
46
+ [Specify your license, e.g., MIT License]
47
+
48
+ **Contact:**
49
+ [Your name/contact information]