mon_nllb_1.3B / README.md
Billyyy's picture
Update README.md
c1b162d verified
metadata
library_name: peft
license: cc-by-nc-4.0
base_model: facebook/nllb-200-distilled-1.3B
model-index:
  - name: mon_nllb_1.3B
    results:
      - task:
          type: translation
        dataset:
          type: flores-200
          name: FLORES-200
        metrics:
          - name: BLEU
            type: BLEU
            value: 44.06
            verified: false
          - name: chrF
            type: chrF
            value: 44.43
            verified: false
          - name: METEOR
            type: METEOR
            value: 0.537
            verified: false
datasets:
  - Billyyy/mn-en-parallel
language:
  - mn
metrics:
  - bleu
  - chrf
  - meteor
pipeline_tag: translation

mon_nllb_1.3B

This model is a fine-tuned version of facebook/nllb-200-distilled-1.3B on an unknown dataset. It achieves the following results on the evaluation set:

  • BLEU: 44.06
  • chrF: 44.43
  • METEOR: 0.537

Example Usage

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model_name = "Billyyy/mon_nllb_1.3B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

text = "Сайн байна уу"?"
inputs = tokenizer(text, return_tensors="pt")

output_tokens = model.generate(**inputs)
translated_text = tokenizer.decode(output_tokens[0], skip_special_tokens=True)

print(translated_text)

Model description

This model was finetuned on Mongolian->English parallel dataset with LoRA

Training and evaluation data

Training data:

Evaluation data:

  • FLORES-200

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 40
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 160
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 2
  • mixed_precision_training: FP16

Training results

Training Loss Epoch Step Validation Loss
7.3708 0.1522 1000 7.2420
7.25 0.3044 2000 7.2126
7.237 0.4567 3000 7.2120
7.2344 0.6089 4000 7.2137
7.2323 0.7611 5000 7.2130
7.2351 0.9133 6000 7.2121
7.222 1.0656 7000 7.2131
7.22 1.2178 8000 7.2122
7.2077 1.3700 9000 7.2131
7.2132 1.5223 10000 7.2132
7.2211 1.6745 11000 7.2128
7.2269 1.8267 12000 7.2131
7.2296 1.9789 13000 7.2132

Framework versions

  • PEFT 0.14.0
  • Transformers 4.49.0
  • Pytorch 2.6.0+cu124
  • Datasets 3.3.2
  • Tokenizers 0.21.0