Continued pre-training on mistralai/Mistral-Nemo-Instruct-2407 using the Kurdish wiki dataset with unsloth. This model should be further fine-tuned since the pre-training was to improve Kurdish language understanding. It's a quantized model using bitsandbytes so that it uses less memory. See bitsandbytes documentation.

There isn't a standard or even a good Kurdish metric to evaluate the model (that I could find). Will make it my next project to create an evaluation so that there's a reproducible baseline for Kurdish.

Will look into a multi-GPU training setup so don't have to wait all day for results. Would like to train it with both Kurmanji and Sorani.

Use

Should be fine-tuned further for a specific task. See instruction fine-tuned model nazimali/Mistral-Nemo-Kurdish-Instruct.

Training

Transformers 4.44.2
1 NVIDIA A100 80GB PCIe
Duration 6h 31m 4s

{
  "total_flos": 4121524790259794000,
  "train/epoch": 1,
  "train/global_step": 1960,
  "train/grad_norm": 3.1958093643188477,
  "train/learning_rate": 0,
  "train/loss": 1.2108,
  "train_loss": 1.256846008738693,
  "train_runtime": 23227.1752,
  "train_samples_per_second": 2.7,
  "train_steps_per_second": 0.084
}

Pre-training data:

  • nazimali/kurdish-wikipedia-articles
    • Dataset number of rows: 63,076
    • Filtered columns title, text
      • Must have at least 1 character
  • Number of rows used for training: 62,720

Training prompt format:

training_prompt = """Gotara Wikipedia
### Sernav: {}

### Gotar:
{}"""
Downloads last month
24
Safetensors
Model size
12.2B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for nazimali/Mistral-Nemo-Kurdish

Finetuned
(46)
this model
Finetunes
1 model
Quantizations
5 models

Dataset used to train nazimali/Mistral-Nemo-Kurdish

Space using nazimali/Mistral-Nemo-Kurdish 1