Tohrumi
/

MistralAI_iwslt15_en_vi_manual

Generated from Trainer

Model card Files Files and versions Metrics Training metrics Community

Tohrumi commited on Apr 12, 2024

Commit

1876682

·

verified ·

1 Parent(s): cbfc655

#1 Test trainer save

Files changed (2) hide show

README.md +59 -0
logs.json +46 -0

README.md ADDED Viewed

	@@ -0,0 +1,59 @@

+---
+license: apache-2.0
+library_name: peft
+tags:
+- trl
+- sft
+- translation
+- generated_from_trainer
+base_model: mistralai/Mistral-7B-v0.1
+model-index:
+- name: MistralAI_iwslt15_en_vi_manual
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# MistralAI_iwslt15_en_vi_manual
+This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on an unknown dataset.
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.002
+- train_batch_size: 8
+- eval_batch_size: 8
+- seed: 42
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_steps: 1
+- num_epochs: 1
+- mixed_precision_training: Native AMP
+### Training results
+### Framework versions
+- PEFT 0.10.0
+- Transformers 4.39.3
+- Pytorch 2.2.1
+- Datasets 2.18.0
+- Tokenizers 0.15.2

logs.json ADDED Viewed

	@@ -0,0 +1,46 @@

+[
+    {
+        "loss": 1.5437,
+        "grad_norm": 1.134100079536438,
+        "learning_rate": 0.0016129032258064516,
+        "epoch": 0.2,
+        "step": 25
+    },
+    {
+        "loss": 1.5328,
+        "grad_norm": 1.6741456985473633,
+        "learning_rate": 0.0012096774193548388,
+        "epoch": 0.4,
+        "step": 50
+    },
+    {
+        "loss": 1.443,
+        "grad_norm": 0.8672377467155457,
+        "learning_rate": 0.0008064516129032258,
+        "epoch": 0.6,
+        "step": 75
+    },
+    {
+        "loss": 1.46,
+        "grad_norm": 1.14051353931427,
+        "learning_rate": 0.0004032258064516129,
+        "epoch": 0.8,
+        "step": 100
+    },
+    {
+        "loss": 1.3072,
+        "grad_norm": 0.6216816902160645,
+        "learning_rate": 0.0,
+        "epoch": 1.0,
+        "step": 125
+    },
+    {
+        "train_runtime": 1303.9515,
+        "train_samples_per_second": 0.767,
+        "train_steps_per_second": 0.096,
+        "total_flos": 8860503771119616.0,
+        "train_loss": 1.4573484497070313,
+        "epoch": 1.0,
+        "step": 125
+    }
+]