Update README.md
Browse files
README.md
CHANGED
@@ -84,7 +84,7 @@ while True:
|
|
84 |
|
85 |
### Datasets
|
86 |
|
87 |
-
|
88 |
|
89 |
- https://huggingface.co/datasets/DataProvenanceInitiative/Commercially-Verified-Licenses
|
90 |
- https://huggingface.co/datasets/sablo/oasst2_curated
|
@@ -179,13 +179,13 @@ Parameters:
|
|
179 |
|
180 |
This model is best thought of as a research artifact, not a polished product. It was the result of almost a single training run with significant data and compute constraints. For any production use case, you should consider performing an additional layer of fine-tuning and alignment - without that it is suitable only for research/non-serious purposes.
|
181 |
|
182 |
-
Due to data and compute constraints, as well as a scarcity of high-quality data, there was a notable lack of experimentation (read: this was a one-shot run, so things might be off).
|
183 |
|
184 |
The model has limited preference alignment from a small sample of the HH-RLHF dataset and may generate misaligned outputs from time to time. Furthermore, it was not trained with a system prompt due to a lack of useful data, which can reduce its steerability.
|
185 |
|
186 |
-
|
187 |
|
188 |
-
All limitations from the base model also apply here
|
189 |
|
190 |
## Footnotes and disclaimer
|
191 |
|
|
|
84 |
|
85 |
### Datasets
|
86 |
|
87 |
+
The following datasets were used for the run:
|
88 |
|
89 |
- https://huggingface.co/datasets/DataProvenanceInitiative/Commercially-Verified-Licenses
|
90 |
- https://huggingface.co/datasets/sablo/oasst2_curated
|
|
|
179 |
|
180 |
This model is best thought of as a research artifact, not a polished product. It was the result of almost a single training run with significant data and compute constraints. For any production use case, you should consider performing an additional layer of fine-tuning and alignment - without that it is suitable only for research/non-serious purposes.
|
181 |
|
182 |
+
Due to data and compute constraints, as well as a scarcity of high-quality data, there was a notable lack of experimentation (read: this was a one-shot run, so things might be off). I didn't scale training to usual post-training scales, and neither did I do any form of RL for math/coding/structured outputs/tool use. I also did not perform mid-training, which is a costly but effective technique used in many SOTA models. Consequently, this model might not perform up to your expectations.
|
183 |
|
184 |
The model has limited preference alignment from a small sample of the HH-RLHF dataset and may generate misaligned outputs from time to time. Furthermore, it was not trained with a system prompt due to a lack of useful data, which can reduce its steerability.
|
185 |
|
186 |
+
I have not performed any specific debiasing. The training data is sourced from broad internet and instructional datasets and will inevitably contain the biases present in that data. The model can and will generate text that reflects these societal biases. Handle with care and be aware of this when using it for any downstream task.
|
187 |
|
188 |
+
All limitations from the base model also apply here, and I strongly recommend reviewing its model card.
|
189 |
|
190 |
## Footnotes and disclaimer
|
191 |
|