numiros commited on
Commit
2aec98f
·
verified ·
1 Parent(s): d373a2b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -84,7 +84,7 @@ while True:
84
 
85
  ### Datasets
86
 
87
- We used the following datasets:
88
 
89
  - https://huggingface.co/datasets/DataProvenanceInitiative/Commercially-Verified-Licenses
90
  - https://huggingface.co/datasets/sablo/oasst2_curated
@@ -179,13 +179,13 @@ Parameters:
179
 
180
  This model is best thought of as a research artifact, not a polished product. It was the result of almost a single training run with significant data and compute constraints. For any production use case, you should consider performing an additional layer of fine-tuning and alignment - without that it is suitable only for research/non-serious purposes.
181
 
182
- Due to data and compute constraints, as well as a scarcity of high-quality data, there was a notable lack of experimentation (read: this was a one-shot run, so things might be off). We didn't scale training to usual post-training scales, and neither did we do any form of RL for math/coding/structured outputs/tool use. We also did not perform mid-training, which is a costly but effective technique used in many SOTA models. Consequently, this model might not perform up to your expectations.
183
 
184
  The model has limited preference alignment from a small sample of the HH-RLHF dataset and may generate misaligned outputs from time to time. Furthermore, it was not trained with a system prompt due to a lack of useful data, which can reduce its steerability.
185
 
186
- We have not performed any specific debiasing. The training data is sourced from broad internet and instructional datasets and will inevitably contain the biases present in that data. The model can and will generate text that reflects these societal biases. Handle with care and be aware of this when using it for any downstream task.
187
 
188
- All limitations from the base model also apply here. We strongly recommend reviewing its model card.
189
 
190
  ## Footnotes and disclaimer
191
 
 
84
 
85
  ### Datasets
86
 
87
+ The following datasets were used for the run:
88
 
89
  - https://huggingface.co/datasets/DataProvenanceInitiative/Commercially-Verified-Licenses
90
  - https://huggingface.co/datasets/sablo/oasst2_curated
 
179
 
180
  This model is best thought of as a research artifact, not a polished product. It was the result of almost a single training run with significant data and compute constraints. For any production use case, you should consider performing an additional layer of fine-tuning and alignment - without that it is suitable only for research/non-serious purposes.
181
 
182
+ Due to data and compute constraints, as well as a scarcity of high-quality data, there was a notable lack of experimentation (read: this was a one-shot run, so things might be off). I didn't scale training to usual post-training scales, and neither did I do any form of RL for math/coding/structured outputs/tool use. I also did not perform mid-training, which is a costly but effective technique used in many SOTA models. Consequently, this model might not perform up to your expectations.
183
 
184
  The model has limited preference alignment from a small sample of the HH-RLHF dataset and may generate misaligned outputs from time to time. Furthermore, it was not trained with a system prompt due to a lack of useful data, which can reduce its steerability.
185
 
186
+ I have not performed any specific debiasing. The training data is sourced from broad internet and instructional datasets and will inevitably contain the biases present in that data. The model can and will generate text that reflects these societal biases. Handle with care and be aware of this when using it for any downstream task.
187
 
188
+ All limitations from the base model also apply here, and I strongly recommend reviewing its model card.
189
 
190
  ## Footnotes and disclaimer
191