Data Collator
Data collators are objects that will form a batch by using a list of dataset elements as input. These elements are of
the same type as the elements of train_dataset
or eval_dataset
.
To be able to build batches, data collators may apply some processing (like padding). Some of them (like
[DataCollatorForLanguageModeling
]) also apply some random data augmentation (like random masking)
on the formed batch.
Examples of use can be found in the example scripts or example notebooks.
Default data collator
[[autodoc]] data.data_collator.default_data_collator
DefaultDataCollator
[[autodoc]] data.data_collator.DefaultDataCollator
DataCollatorWithPadding
[[autodoc]] data.data_collator.DataCollatorWithPadding
DataCollatorForTokenClassification
[[autodoc]] data.data_collator.DataCollatorForTokenClassification
DataCollatorForSeq2Seq
[[autodoc]] data.data_collator.DataCollatorForSeq2Seq
DataCollatorForLanguageModeling
[[autodoc]] data.data_collator.DataCollatorForLanguageModeling - numpy_mask_tokens - tf_mask_tokens - torch_mask_tokens
DataCollatorForWholeWordMask
[[autodoc]] data.data_collator.DataCollatorForWholeWordMask - numpy_mask_tokens - tf_mask_tokens - torch_mask_tokens
DataCollatorForPermutationLanguageModeling
[[autodoc]] data.data_collator.DataCollatorForPermutationLanguageModeling - numpy_mask_tokens - tf_mask_tokens - torch_mask_tokens