Duino
/

Darija-LM

+# Multilingual GPT Model (Byte-Level)
+This model is a multilingual GPT model trained on byte-level encodings of Wikipedia articles in Arabic (ar) and Egyptian Arabic (ary).
+**Model Details:**
+- Trained using a byte-level vocabulary (size: 32000).
+- Architecture: Transformer-based GPT model.
+- Languages: Arabic (ar), Egyptian Arabic (ary).
+- Training Data: Streamed Wikipedia dataset (limited to 10000 articles per language).
+- Training Code: [Link to your training script/GitHub repo if available]
+**Usage:**
+[Provide instructions on how to load and use the model. E.g., using `torch.load` and the provided `GPTLanguageModel` class.]
+**Example (Conceptual - Adapt to your actual loading process):**
+```python
+import torch
+from your_model_definition_script import GPTLanguageModel # Assuming you save model definition
+# Initialize model architecture (must be defined in a separate script)
+model = GPTLanguageModel()
+model.load_state_dict(torch.load('model_weights.pth')) # Load from local if downloaded from HF
+model.eval()
+# ... (rest of your inference code) ...
+```
+**Training Hyperparameters:**
+- Batch Size: 32
+- Block Size: 256
+- Embedding Dimension: 384
+- Number of Heads: 6
+- Number of Layers: 6
+- Dropout: 0.2
+- Optimizer: AdamW
+- Learning Rate: 0.0006
+- Max Iterations: 5000
+**Loss Curve:**
+[You can optionally add a link or embed the training plot image here]
+**License:**
+[Specify your license, e.g., MIT License]
+**Contact:**
+[Your name/contact information]