| --- | |
| title: Pre-training | |
| description: Data format for a pre-training completion task. | |
| order: 1 | |
| --- | |
| For pretraining, there is no prompt template or roles. The only required field is `text`: | |
| ```{.json filename="data.jsonl"} | |
| {"text": "first row"} | |
| {"text": "second row"} | |
| ... | |
| ``` | |
| ### Streaming is recommended for large datasets | |
| Axolotl usually loads the entire dataset into memory. This will be challenging for large datasets. Use the following config to enable streaming: | |
| ```{.yaml filename="config.yaml"} | |
| pretraining_dataset: # hf path only | |
| ... | |
| ``` | |