Alepach commited on
Commit
57ba373
·
verified ·
1 Parent(s): 4ede074

Recover model card and add dataset size info

Browse files
Files changed (1) hide show
  1. README.md +34 -16
README.md CHANGED
@@ -6,30 +6,37 @@ tags:
6
  - generated_from_trainer
7
  - trl
8
  - sft
9
- licence: license
 
 
 
10
  ---
11
 
12
- # Model Card for notHumpback-M1
13
 
14
- This model is a fine-tuned version of [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B).
15
- It has been trained using [TRL](https://github.com/huggingface/trl).
16
-
17
- ## Quick start
18
 
19
- ```python
20
- from transformers import pipeline
21
 
22
- question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
23
- generator = pipeline("text-generation", model="Alepach/notHumpback-M1", device="cuda")
24
- output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
25
- print(output["generated_text"])
26
- ```
27
 
28
- ## Training procedure
 
 
 
29
 
 
 
30
 
 
 
 
31
 
32
- This model was trained with SFT.
33
 
34
  ### Framework versions
35
 
@@ -41,7 +48,18 @@ This model was trained with SFT.
41
 
42
  ## Citations
43
 
 
44
 
 
 
 
 
 
 
 
 
 
 
45
 
46
  Cite TRL as:
47
 
@@ -54,4 +72,4 @@ Cite TRL as:
54
  publisher = {GitHub},
55
  howpublished = {\url{https://github.com/huggingface/trl}}
56
  }
57
- ```
 
6
  - generated_from_trainer
7
  - trl
8
  - sft
9
+ license: apache-2.0
10
+ datasets:
11
+ - OpenAssistant/oasst1
12
+ - allenai/c4
13
  ---
14
 
15
+ # notHumpback-M1
16
 
17
+ This model follows the Humpback architecture, proposed in the paper [Self-Alignment with Instruction Backtranslation](https://arxiv.org/pdf/2308.06259)
18
+ by Li et al.
 
 
19
 
20
+ It represents the resulting model after the first iteration of self-curation, which is trained on a small amount of gold data
21
+ and a set of generated data curated by the ["seed model"](https://huggingface.co/Alepach/notHumpback-M0).
22
 
23
+ This model can be used for instruction-following.
24
+ It may also be used to, again, score the instruction-response pairs
25
+ generated by the ["backward model"](https://huggingface.co/Alepach/notHumpback-Myx) for a second iteration of self-curation.
 
 
26
 
27
+ Humpback uses instruction backtranslation on a web corpus to generate input-output pairs (self-augmentation),
28
+ creating a richer dataset for fine-tuning models without the need for additional manual annotation.
29
+ The model then iteratively curates the created dataset, scoring the pairs by quality, and is then finetuned on the resulting subset
30
+ of all pairs with the highest possible score (self-curation).
31
 
32
+ Varying from the original paper, this model is a fine-tuned version of [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B).
33
+ It has been trained using [TRL](https://github.com/huggingface/trl).
34
 
35
+ The dataset used to train this model is a combination of data sampled from the [oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1)
36
+ dataset and the synthetic dataset which was mentioned above. The latter has been created by applying self-augmentation and self-curation
37
+ on 502k entries from the english subset ("en") of the [c4](https://huggingface.co/datasets/allenai/c4) dataset.
38
 
39
+ For comparison with other methods, the training dataset was limited to 16000 instruction-response pairs.
40
 
41
  ### Framework versions
42
 
 
48
 
49
  ## Citations
50
 
51
+ Original paper:
52
 
53
+ ```bibtex
54
+ @misc{li2023selfalignment,
55
+ title={Self-Alignment with Instruction Backtranslation},
56
+ author={Xian Li and Ping Yu and Chunting Zhou and Timo Schick and Luke Zettlemoyer and Omer Levy and Jason Weston and Mike Lewis},
57
+ year={2023},
58
+ eprint={2308.06259},
59
+ archivePrefix={arXiv},
60
+ primaryClass={cs.CL}
61
+ }
62
+ ```
63
 
64
  Cite TRL as:
65
 
 
72
  publisher = {GitHub},
73
  howpublished = {\url{https://github.com/huggingface/trl}}
74
  }
75
+ ```