Commit History
make sure to use train split if loading from hf
607a4d3
winglian
commited on
fix new dataset prompt tokenizers
0f74464
winglian
commited on
pygmalion dataset prompts format, cached tokenized datasets should be hashed on the tokenizer too
2809f3f
winglian
commited on
tokenization fixes
4ea9a66
winglian
commited on
optionally be able to specify alpaca or chat style prompts
1d5ab84
winglian
commited on
concise multiple choice and tldr summarize
1365073
winglian
commited on
add alpaca multiple choice instruct dataset support
b46bc02
winglian
commited on
move filter to before saving so it doesn't happen everytime, update runpod manual script
0d28df0
winglian
commited on
whoops, gt vs lt
84c7bc4
winglian
commited on
optimize dataloading to use cache, fix model token embedding sizes
aa3c3f9
winglian
commited on
black formatting
2bc1a5b
winglian
commited on
fix conditional so alpaca doesn't choke
a27d594
winglian
commited on
Add CompletionPrompt type
cf68153
Nanobit
commited on
Jeopardy bot! (#17)
a12fb0a
unverified
winglian
commited on
fix dataset handling, support galactica
4a17a4c
winglian
commited on
tweaks to data loading, 8 bit adam, accelerate and deepspeed
097d367
winglian
commited on
shuffle and split dataset after save/load
4f2584f
winglian
commited on
fix sharegpt handling from hf, don't worry about loading llama if using earlier transformers release
8d43785
winglian
commited on
various bugfixes
94f5e41
winglian
commited on
WIP large refactor to make finetune script a little more manageable (#3)
6045345
unverified
winglian
commited on