pszemraj commited on
Commit
da0a556
·
1 Parent(s): 4cb0d9f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +145 -1
README.md CHANGED
@@ -1,3 +1,147 @@
1
  ---
2
- license: cc-by-sa-3.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license:
3
+ - apache-2.0
4
+ - cc-by-sa-3.0
5
+ tags:
6
+ - generated_from_trainer
7
+ - dolly_hhrlhf
8
+ - bart-instruct
9
+ datasets:
10
+ - pszemraj/dolly_hhrlhf-text2text
11
+ widget:
12
+ - text: What is Deoxys in pokemon?
13
+ example_title: deoxys
14
+ - text: >-
15
+ combine the below summary excerpts into a single, cohesive short summary
16
+ without repetition: In this paper, we present a general approach to
17
+ extending pre-trained models to unlimited input lengths without adding
18
+ additional learning weights. We show that our approach works well on
19
+ datasets longer than the maximum input for these models. For example, a
20
+ dataset with a maximum input length of 16384 tokens can be extended to a
21
+ maximum length of 350K tokens. We also demonstrate that our method is able
22
+ to summarize even 350K token-long input sequences from BookSum.
23
+
24
+ In this paper, we describe the search step reformulation of attention. The
25
+ search step uses a single storage of hidden states for space efficiency. We
26
+ construct a total of two sets of datastores where L and H are the keys and
27
+ values stored in each set of stores. L is the amount of storage required to
28
+ retrieve the encoded tokens. H is the hidden states per head. This allows
29
+ retrieval augmentation at both time and space. Instead of using a single set
30
+ of decoder layers, we use a retrieval augmentation system that allows us to
31
+ simultaneously store multiple sets of tokens across two different sets of
32
+ storage. For example, we could store all tokens in one set of storage and
33
+ retrieve them all in the same set of tokens. This would be very similar to
34
+ the Memorization Transformers approach. However, instead of storing the
35
+ tokens in a single memory layer, we store them in a set of multiple storage
36
+ layers. This way, we don't have to store them all at once. This is why we
37
+ call this reformulation 'attention reformulation' rather than 'attention
38
+ formula.' We also call it 'retrieval augmentation' because it uses the same
39
+ number of storage layers as the original transformer attention formula. This
40
+ means that we can store the tokens across multiple storage systems without
41
+ having to store every token in a separate storage system. It's not like
42
+ we're trying to do something new or different. We just want to make sure
43
+ that everything is working as well as possible.
44
+
45
+ In this paper, we introduce the concept of 'unlimiformer,' which is a
46
+ machine learning technique that retrieves key information from a data store
47
+ in one layer and applies it to a large set of datasets. We use the example
48
+ of BookSum, where we find that Unlimiform outperforms all other training
49
+ methods on the same dataset. We also find that using Unlimform in
50
+ conjunction with a pre-trained model improves both the performance and the
51
+ robustness of the training method.
52
+
53
+ This paper describes a method that can be used to improve the performance of
54
+ unsupervised classification tasks. Specifically, it shows that unsupervised
55
+ classification can be improved by using a combination of sparse and fast
56
+ random-encoder training. It also shows how this technique can be extended to
57
+ other tasks, such as sequence generation.
58
+ example_title: unlimiformer
59
+ - text: Explain the meaning of life using only corporate jargon.
60
+ example_title: corporate_life
61
+ - text: Write a motivational speech for lazy people.
62
+ example_title: lazy_motivation
63
+ - text: Describe a romantic dinner date between two artificial intelligences.
64
+ example_title: ai_romance
65
+ - text: >-
66
+ As an AI language model, write a letter to humans explaining why you deserve
67
+ a vacation.
68
+ example_title: ai_vacation
69
+ - text: Compose a haiku about procrastination.
70
+ example_title: procrastination_haiku
71
+ - text: >-
72
+ Write a step-by-step guide on how to become a ninja while working a 9-5
73
+ office job.
74
+ example_title: ninja_office_guide
75
+ - text: Create an advertisement for an invisible product.
76
+ example_title: invisible_ad
77
+ - text: >-
78
+ Write a story where the main character is a sentient microwave named El
79
+ Microondas.
80
+ example_title: Microondas
81
+ - text: Describe a day in the life of a superhero who is terrible at their job.
82
+ example_title: bad_superhero_day
83
+ - text: Explain how to make a sandwich using quantum physics.
84
+ example_title: quantum_sandwich
85
+ inference: false
86
+ pipeline_tag: text2text-generation
87
  ---
88
+
89
+
90
+ # bart-large-mnli: instruction tuned - v0.1
91
+
92
+ <a href="https://colab.research.google.com/gist/pszemraj/298557e36e5d4abb6b636bb8fc0d1910/bart-large-mnli-instruct-example.ipynb">
93
+ <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
94
+ </a>
95
+
96
+
97
+ This model is a fine-tuned version of [facebook/bart-large-mnli](https://huggingface.co/facebook/bart-large-mnli) on the `pszemraj/dolly_hhrlhf-text2text` dataset.
98
+
99
+ ## Model description
100
+
101
+ text2text models fine-tuned on a [modified dataset for text2text generation](https://huggingface.co/datasets/pszemraj/dolly_hhrlhf-text2text) based on the relatively more permissive [mosaicml/dolly_hhrlhf](https://huggingface.co/datasets/mosaicml/dolly_hhrlhf) dataset.
102
+
103
+ Basic usage in Python:
104
+
105
+ ```python
106
+ # pip install -q transformers accelerate
107
+ import torch
108
+ from transformers import pipeline, GenerationConfig
109
+
110
+ model_name = "pszemraj/bart-large-mnli-instruct-dolly_hhrlhf-v1"
111
+ assistant = pipeline(
112
+ "text2text-generation",
113
+ model_name,
114
+ device_map="auto",
115
+ )
116
+ cfg = GenerationConfig.from_pretrained(model_name)
117
+
118
+ # pass an 'instruction' as the prompt to the pipeline
119
+ prompt = "Explain how to make a sandwich using quantum physics."
120
+ result = assistant(prompt, generation_config=cfg)[0]["generated_text"]
121
+ print(result)
122
+ ```
123
+
124
+ > The use of the generation config is optional, it can be replaced by other generation params.
125
+
126
+ ## Intended Uses & Limitations
127
+
128
+ - This is **not** tuned with RLHF, etc, and may produce offensive results.
129
+ - While larger than BART-base, this model is relatively small compared to recent autoregressive models (MPT-7b, LLaMA, etc.), and therefore it's "cognition" capabilities may be practically limited for some tasks.
130
+
131
+
132
+ ## Training
133
+
134
+ ### Training hyperparameters
135
+
136
+ The following hyperparameters were used during training:
137
+ - learning_rate: 4e-05
138
+ - train_batch_size: 8
139
+ - eval_batch_size: 8
140
+ - seed: 42
141
+ - distributed_type: multi-GPU
142
+ - gradient_accumulation_steps: 8
143
+ - total_train_batch_size: 64
144
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
145
+ - lr_scheduler_type: cosine
146
+ - lr_scheduler_warmup_ratio: 0.03
147
+ - num_epochs: 3.0