Darija-LM / README.md
Duino's picture
Upload README.md with huggingface_hub
bb38df8 verified
|
raw
history blame
1.48 kB

Multilingual GPT Model (Byte-Level)

This model is a multilingual GPT model trained on byte-level encodings of Wikipedia articles in Arabic (ar) and Egyptian Arabic (ary).

Model Details:

  • Trained using a byte-level vocabulary (size: 32000).
  • Architecture: Transformer-based GPT model.
  • Languages: Arabic (ar), Egyptian Arabic (ary).
  • Training Data: Streamed Wikipedia dataset (limited to 10000 articles per language).
  • Training Code: [Link to your training script/GitHub repo if available]

Usage:

[Provide instructions on how to load and use the model. E.g., using torch.load and the provided GPTLanguageModel class.]

Example (Conceptual - Adapt to your actual loading process):

import torch
from your_model_definition_script import GPTLanguageModel # Assuming you save model definition

# Initialize model architecture (must be defined in a separate script)
model = GPTLanguageModel()
model.load_state_dict(torch.load('model_weights.pth')) # Load from local if downloaded from HF
model.eval()

# ... (rest of your inference code) ...

Training Hyperparameters:

  • Batch Size: 32
  • Block Size: 256
  • Embedding Dimension: 384
  • Number of Heads: 6
  • Number of Layers: 6
  • Dropout: 0.2
  • Optimizer: AdamW
  • Learning Rate: 0.0006
  • Max Iterations: 5000

Loss Curve: [You can optionally add a link or embed the training plot image here]

License: [Specify your license, e.g., MIT License]

Contact: [Your name/contact information]