File size: 1,484 Bytes
bb38df8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
# Multilingual GPT Model (Byte-Level)
This model is a multilingual GPT model trained on byte-level encodings of Wikipedia articles in Arabic (ar) and Egyptian Arabic (ary).
**Model Details:**
- Trained using a byte-level vocabulary (size: 32000).
- Architecture: Transformer-based GPT model.
- Languages: Arabic (ar), Egyptian Arabic (ary).
- Training Data: Streamed Wikipedia dataset (limited to 10000 articles per language).
- Training Code: [Link to your training script/GitHub repo if available]
**Usage:**
[Provide instructions on how to load and use the model. E.g., using `torch.load` and the provided `GPTLanguageModel` class.]
**Example (Conceptual - Adapt to your actual loading process):**
```python
import torch
from your_model_definition_script import GPTLanguageModel # Assuming you save model definition
# Initialize model architecture (must be defined in a separate script)
model = GPTLanguageModel()
model.load_state_dict(torch.load('model_weights.pth')) # Load from local if downloaded from HF
model.eval()
# ... (rest of your inference code) ...
```
**Training Hyperparameters:**
- Batch Size: 32
- Block Size: 256
- Embedding Dimension: 384
- Number of Heads: 6
- Number of Layers: 6
- Dropout: 0.2
- Optimizer: AdamW
- Learning Rate: 0.0006
- Max Iterations: 5000
**Loss Curve:**
[You can optionally add a link or embed the training plot image here]
**License:**
[Specify your license, e.g., MIT License]
**Contact:**
[Your name/contact information]
|