|
|
|
# Multilingual GPT Model (Byte-Level) |
|
|
|
This model is a multilingual GPT model trained on byte-level encodings of Wikipedia articles in Arabic (ar) and Egyptian Arabic (ary). |
|
|
|
**Model Details:** |
|
- Trained using a byte-level vocabulary (size: 32000). |
|
- Architecture: Transformer-based GPT model. |
|
- Languages: Arabic (ar), Egyptian Arabic (ary). |
|
- Training Data: Streamed Wikipedia dataset (limited to 10000 articles per language). |
|
- Training Code: [Link to your training script/GitHub repo if available] |
|
|
|
**Usage:** |
|
|
|
[Provide instructions on how to load and use the model. E.g., using `torch.load` and the provided `GPTLanguageModel` class.] |
|
|
|
**Example (Conceptual - Adapt to your actual loading process):** |
|
|
|
```python |
|
import torch |
|
from your_model_definition_script import GPTLanguageModel # Assuming you save model definition |
|
|
|
# Initialize model architecture (must be defined in a separate script) |
|
model = GPTLanguageModel() |
|
model.load_state_dict(torch.load('model_weights.pth')) # Load from local if downloaded from HF |
|
model.eval() |
|
|
|
# ... (rest of your inference code) ... |
|
``` |
|
|
|
**Training Hyperparameters:** |
|
- Batch Size: 32 |
|
- Block Size: 256 |
|
- Embedding Dimension: 384 |
|
- Number of Heads: 6 |
|
- Number of Layers: 6 |
|
- Dropout: 0.2 |
|
- Optimizer: AdamW |
|
- Learning Rate: 0.0006 |
|
- Max Iterations: 5000 |
|
|
|
**Loss Curve:** |
|
[You can optionally add a link or embed the training plot image here] |
|
|
|
**License:** |
|
[Specify your license, e.g., MIT License] |
|
|
|
**Contact:** |
|
[Your name/contact information] |
|
|