Alepach
/

notHumpback-M1

@@ -6,30 +6,37 @@ tags:
 - generated_from_trainer
 - trl
 - sft
-licence: license
 ---
-# Model Card for notHumpback-M1
-This model is a fine-tuned version of [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B).
-It has been trained using [TRL](https://github.com/huggingface/trl).
-## Quick start
-```python
-from transformers import pipeline
-question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
-generator = pipeline("text-generation", model="Alepach/notHumpback-M1", device="cuda")
-output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
-print(output["generated_text"])
-```
-## Training procedure
-This model was trained with SFT.
 ### Framework versions
@@ -41,7 +48,18 @@ This model was trained with SFT.
 ## Citations
 Cite TRL as:
@@ -54,4 +72,4 @@ Cite TRL as:
 	publisher    = {GitHub},
 	howpublished = {\url{https://github.com/huggingface/trl}}
 }
-```

 - generated_from_trainer
 - trl
 - sft
+license: apache-2.0
+datasets:
+- OpenAssistant/oasst1
+- allenai/c4
 ---
+# notHumpback-M1
+This model follows the Humpback architecture, proposed in the paper [Self-Alignment with Instruction Backtranslation](https://arxiv.org/pdf/2308.06259)
+by Li et al.
+It represents the resulting model after the first iteration of self-curation, which is trained on a small amount of gold data
+and a set of generated data curated by the ["seed model"](https://huggingface.co/Alepach/notHumpback-M0).
+This model can be used for instruction-following.
+It may also be used to, again, score the instruction-response pairs
+generated by the ["backward model"](https://huggingface.co/Alepach/notHumpback-Myx) for a second iteration of self-curation.
+Humpback uses instruction backtranslation on a web corpus to generate input-output pairs (self-augmentation),
+creating a richer dataset for fine-tuning models without the need for additional manual annotation.
+The model then iteratively curates the created dataset, scoring the pairs by quality, and is then finetuned on the resulting subset
+of all pairs with the highest possible score (self-curation).
+Varying from the original paper, this model is a fine-tuned version of [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B).
+It has been trained using [TRL](https://github.com/huggingface/trl).
+The dataset used to train this model is a combination of data sampled from the [oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1)
+dataset and the synthetic dataset which was mentioned above. The latter has been created by applying self-augmentation and self-curation
+on 502k entries from the english subset ("en") of the [c4](https://huggingface.co/datasets/allenai/c4) dataset.
+For comparison with other methods, the training dataset was limited to 16000 instruction-response pairs.
 ### Framework versions
 ## Citations
+Original paper:
+```bibtex
+@misc{li2023selfalignment,
+    title={Self-Alignment with Instruction Backtranslation},
+    author={Xian Li and Ping Yu and Chunting Zhou and Timo Schick and Luke Zettlemoyer and Omer Levy and Jason Weston and Mike Lewis},
+    year={2023},
+    eprint={2308.06259},
+    archivePrefix={arXiv},
+    primaryClass={cs.CL}
+}
+```
 Cite TRL as:
 	publisher    = {GitHub},
 	howpublished = {\url{https://github.com/huggingface/trl}}
 }
+```