Upload folder using huggingface_hub

Browse files

Files changed (17) hide show

README.md +62 -37
config.yaml +10 -0
final_model.pth +3 -0
model_iter_1000.pth +3 -0
model_iter_1500.pth +3 -0
model_iter_2000.pth +3 -0
model_iter_2500.pth +3 -0
model_iter_3000.pth +3 -0
model_iter_3500.pth +3 -0
model_iter_4000.pth +3 -0
model_iter_4500.pth +3 -0
model_iter_500.pth +3 -0
model_iter_5000.pth +3 -0
model_tensors.pt +3 -0
spm_model.model +3 -0
spm_model.vocab +0 -0
training_plot.png +0 -0

README.md CHANGED Viewed

@@ -1,49 +1,74 @@
-# Multilingual GPT Model (Byte-Level)
-This model is a multilingual GPT model trained on byte-level encodings of Wikipedia articles in Arabic (ar) and Egyptian Arabic (ary).
-**Model Details:**
-- Trained using a byte-level vocabulary (size: 32000).
-- Architecture: Transformer-based GPT model.
-- Languages: Arabic (ar), Egyptian Arabic (ary).
-- Training Data: Streamed Wikipedia dataset (limited to 10000 articles per language).
-- Training Code: [Link to your training script/GitHub repo if available]
-**Usage:**
-[Provide instructions on how to load and use the model. E.g., using `torch.load` and the provided `GPTLanguageModel` class.]
-**Example (Conceptual - Adapt to your actual loading process):**
 ```python
-import torch
-from your_model_definition_script import GPTLanguageModel # Assuming you save model definition
-# Initialize model architecture (must be defined in a separate script)
-model = GPTLanguageModel()
-model.load_state_dict(torch.load('model_weights.pth')) # Load from local if downloaded from HF
-model.eval()
-# ... (rest of your inference code) ...
 ```
-**Training Hyperparameters:**
-- Batch Size: 32
-- Block Size: 256
-- Embedding Dimension: 384
-- Number of Heads: 6
-- Number of Layers: 6
-- Dropout: 0.2
-- Optimizer: AdamW
-- Learning Rate: 0.0006
-- Max Iterations: 5000
-**Loss Curve:**
-[You can optionally add a link or embed the training plot image here]
-**License:**
-[Specify your license, e.g., MIT License]
-**Contact:**
-[Your name/contact information]

+---
+language_model:
+- causal
+license: apache-2.0
+tags:
+- multilingual
+- arabic
+- darija
+- transformers
+- text-generation
+model-index:
+- name: Darija-LM
+  results: []
+---
+# Darija-LM
+This is a multilingual language model trained on Arabic and Darija (Moroccan Arabic) Wikipedia datasets.
+## Model Description
+[**TODO: Add a detailed description of your model here.**]
+For example, you can include:
+- Model architecture: GPT-like Transformer
+- Training data: Arabic and Darija Wikipedia (20231101 snapshot)
+- Tokenizer: SentencePiece (BPE, vocab size: 32000)
+- Training parameters: [Specify hyperparameters like learning rate, batch size, layers, heads, etc.]
+## Intended Uses & Limitations
+[**TODO: Describe the intended uses and limitations of this model.**]
+For example:
+- Intended use cases: Text generation, research in multilingual NLP, exploring low-resource language models.
+- Potential limitations: May not be suitable for production environments without further evaluation and fine-tuning, potential biases from Wikipedia data.
+## How to Use
+[**TODO: Add instructions on how to load and use the model.**]
 ```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "Duino/Darija-LM" # or path to your saved model locally
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(model_name)
+# Example generation code (adapt as needed based on your model and tokenizer)
+# input_text = "مرحبا بالعالم" # Example Arabic/Darija input
+# input_ids = tokenizer.encode(input_text, return_tensors="pt").to("cuda" if torch.cuda.is_available() else "cpu")
+# output = model.generate(input_ids, max_length=50, num_beams=5, no_repeat_ngram_size=2, early_stopping=True)
+# generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
+# print(generated_text)
 ```
+## Training Details
+[**TODO: Provide details about the training process.**]
+- Training data preprocessing: [Describe tokenization, data splitting, etc.]
+- Training procedure: [Optimizer, learning rate schedule, number of iterations, etc.]
+- Hardware: [Specify GPUs or TPUs used]
+## Evaluation
+[**TODO: Include evaluation metrics if you have them.**]
+- [Metrics and results on a validation set or benchmark.]
+## Citation
+[**TODO: Add citation information if applicable.**]
+## Model Card Contact
+[**TODO: Add your contact information.**]
+- [Your name/organization]
+- [Your email/website/Hugging Face profile]

config.yaml ADDED Viewed

	@@ -0,0 +1,10 @@

+_name_or_path: Duino/Darija-LM
+architectures:
+- GPTLanguageModel
+block_size: 256
+dropout: 0.2
+n_embd: 384
+n_head: 6
+n_layer: 6
+tokenizer_class: SentencePieceTokenizer
+vocab_size: 32000

final_model.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2751195ca13cd0111deeda1e3a34c3de467dfb0f44d8eb93e41a24657606ffb3
+size 150904870

model_iter_1000.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2354ef1b9bdd6bfc716f8f4cbf8f5fe718616f5f7e8605d41f84a48f9874a9e5
+size 150905726

model_iter_1500.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a4a4bd0ad71a61324cf0e984d516d9ddd5ab5e8c659cdb933b92d9d63d83e312
+size 150905726

model_iter_2000.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1f66f05df8482b915f9be6bd2ece04ad1f9799adc39be02da1835566707e8d9d
+size 150905726

model_iter_2500.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1c01947b20d83a85f78333181f6c9456fd77d4207959167de66a03836ee84658
+size 150905726

model_iter_3000.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cc541316b414e1ea00546d04806b6fda6333faaa80a73af77b9ca666d65033f8
+size 150905726

model_iter_3500.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a41ce5eecc0c240ffcbf96efe547f3192ba3de3c516916aa46509c0bf626ae08
+size 150905726

model_iter_4000.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dc9b5424fdf7517b59c10c76fff47ba2c90734ea5a36337a3b13dfd91c87691f
+size 150905726

model_iter_4500.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:196538c2e040feac1470f3ab78dc2544c9bd548c335e9b5da72c757aa6f807ef
+size 150905726

model_iter_500.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:af1c9ac435e999760164916c9a1235c88505fa863c40d343e4291e4e2b4d00a9
+size 150905512

model_iter_5000.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c8c40311a49aa3cf3ba5be802f581a738f7d2f3696b7b16f2388ce3e15703a35
+size 150905726

model_tensors.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:639206b9cdfc1b02f278ca1457a4e48ef050e506bb994be08a23701143ff80fd
+size 150947986

spm_model.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b622ee3438740c316c30cace8fbf9a133233e7b4dff65b4bc29ddb26f50f5a6d
+size 872745

spm_model.vocab ADDED Viewed

The diff for this file is too large to render. See raw diff

training_plot.png ADDED Viewed