Alepach commited on
Commit
4ede074
·
verified ·
1 Parent(s): c222878

Model save

Browse files
README.md CHANGED
@@ -6,35 +6,30 @@ tags:
6
  - generated_from_trainer
7
  - trl
8
  - sft
9
- license: apache-2.0
10
- datasets:
11
- - OpenAssistant/oasst1
12
- - allenai/c4
13
  ---
14
 
15
- # notHumpback-M1
16
 
17
- This model follows the Humpback architecture, proposed in the paper [Self-Alignment with Instruction Backtranslation](https://arxiv.org/pdf/2308.06259)
18
- by Li et al.
 
 
19
 
20
- It represents the resulting model after the first iteration of self-curation, which is trained on a small amount of gold data
21
- and a set of generated data curated by the ["seed model"](https://huggingface.co/Alepach/notHumpback-M0).
22
 
23
- This model can be used for instruction-following.
24
- It may also be used to, again, score the instruction-response pairs
25
- generated by the ["backward model"](https://huggingface.co/Alepach/notHumpback-Myx) for a second iteration of self-curation.
 
 
26
 
27
- Humpback uses instruction backtranslation on a web corpus to generate input-output pairs (self-augmentation),
28
- creating a richer dataset for fine-tuning models without the need for additional manual annotation.
29
- The model then iteratively curates the created dataset, scoring the pairs by quality, and is then finetuned on the resulting subset
30
- of all pairs with the highest possible score (self-curation).
31
 
32
- Varying from the original paper, this model is a fine-tuned version of [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B).
33
- It has been trained using [TRL](https://github.com/huggingface/trl).
34
 
35
- The dataset used to train this model is a combination of data sampled from the [oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1)
36
- dataset and the synthetic dataset which was mentioned above. The latter has been created by applying self-augmentation and self-curation
37
- on 502k entries from the english subset ("en") of the [c4](https://huggingface.co/datasets/allenai/c4) dataset.
38
 
39
  ### Framework versions
40
 
@@ -46,18 +41,7 @@ on 502k entries from the english subset ("en") of the [c4](https://huggingface.c
46
 
47
  ## Citations
48
 
49
- Original paper:
50
 
51
- ```bibtex
52
- @misc{li2023selfalignment,
53
- title={Self-Alignment with Instruction Backtranslation},
54
- author={Xian Li and Ping Yu and Chunting Zhou and Timo Schick and Luke Zettlemoyer and Omer Levy and Jason Weston and Mike Lewis},
55
- year={2023},
56
- eprint={2308.06259},
57
- archivePrefix={arXiv},
58
- primaryClass={cs.CL}
59
- }
60
- ```
61
 
62
  Cite TRL as:
63
 
 
6
  - generated_from_trainer
7
  - trl
8
  - sft
9
+ licence: license
 
 
 
10
  ---
11
 
12
+ # Model Card for notHumpback-M1
13
 
14
+ This model is a fine-tuned version of [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B).
15
+ It has been trained using [TRL](https://github.com/huggingface/trl).
16
+
17
+ ## Quick start
18
 
19
+ ```python
20
+ from transformers import pipeline
21
 
22
+ question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
23
+ generator = pipeline("text-generation", model="Alepach/notHumpback-M1", device="cuda")
24
+ output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
25
+ print(output["generated_text"])
26
+ ```
27
 
28
+ ## Training procedure
 
 
 
29
 
 
 
30
 
31
+
32
+ This model was trained with SFT.
 
33
 
34
  ### Framework versions
35
 
 
41
 
42
  ## Citations
43
 
 
44
 
 
 
 
 
 
 
 
 
 
 
45
 
46
  Cite TRL as:
47
 
model-00001-of-00002.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:781956293cc99f11c53f79fc56a25cc694a398ecaf01e50cd0f50bc4197f4e94
3
  size 4965799096
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c4041a510e523ecf82dc5c5d009b232972f4b94e9eae5181a8712140e523c0df
3
  size 4965799096
model-00002-of-00002.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3156b918a65412939a07344f082a227b5451165629df0e570dc143c4b0e0feb7
3
  size 1459729952
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dda58e22690fac7ee3929d8165d426ac123f1aae9d7b879e5f49e8f554e7dc83
3
  size 1459729952
tokenizer.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6b9e4e7fb171f92fd137b777cc2714bf87d11576700a1dcd7a399e7bbe39537b
3
- size 17209920
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:76cfe2f054560aae896b2b75e273dc97a39e304d4ad19c44a9727a1d6b33c4cc
3
+ size 17210021
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:695df2cc7631d8878a89abb1c77c4fef5bea2dcd14bbc35249581c9dff56d679
3
  size 5560
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:771a5f073e3f38bf66373390370b38d494e2912de8713762f2e145aa1d0cad04
3
  size 5560