Commit History
streaming multipack for pretraining dataset (#959)
553c80f
unverified
fix: revert local dir dataset load (#878)
575a082
unverified
don't train if eval split is too small (#873)
797f3dd
unverified
Feat: Add dataset loading from S3, GCS (#765)
3cc67d2
unverified
cleanup the old multipack dataloader (#841)
1a6309c
unverified
multipack w batch sampler (#795)
641e6f7
unverified
update table for rwkv4 support, fix process count for dataset (#822)
cdc71f7
unverified
Create preprocess CLI (#785)
e50ab07
unverified
catch ConnectionError when checking dataset from HuggingFace (#743)
992d57f
unverified
Napuh
commited on
improve handling of the prepared ds path and other cfg defaults (#701)
1c412c7
unverified
Fix: Future deprecation warning with use_auth_token (#680)
69fac9a
unverified
prepared dataset caching, other misc fixes (#665)
e50a64e
unverified
add support for defined train split (#654)
409ca0f
unverified
Fix bug in dataset loading (#284)
8fe0e63
unverified
use fastchat conversations template (#578)
e7d3e2d
unverified
attention_mask not needed for training (#642)
e8cbf50
unverified
Feat(data): Allow loading local csv and text (#594)
00dce35
unverified
support custom field for completion from yml (#580)
f7a2263
unverified
remove columns after tokenizing for pretraining (#571)
1157950
unverified
Fix pretraining with iterable/streaming Dataset (#556)
2f586d1
unverified
Jan Philipp Harries
Jan Philipp Harries
commited on
workaround for md5 variations (#533)
0b4cf5b
unverified
support for datasets with multiple names (#480)
5ac3392
unverified
improve llama pad token handling (#475)
cb9797e
unverified
support user defined prompters, pretokenized datasets in config, local parquet, local arrow files (#348)
d2e7f27
unverified
add utils.data.prepare_dataset
2e22404
use context manager to run things on rank0 before others (#397)
fc2d6be
unverified
Attention mask and position id fixes for packing (#285)
2bb0b78
unverified
experimental llama 2 chat support (#296)
3392270
unverified
Jan Philipp Harries
Jan Philipp Harries
commited on