Commit History
Phi update 202311 (#876)
9bf854e
unverified
allow overriding of model_config parameters from the YML (#853)
1bc1186
unverified
fix model parallel (#816)
964d858
unverified
fix(tokenizer): update log order after update (#806)
10388a8
unverified
fix(config): Set eos/bos to tokenizer if different (#801)
637ed09
unverified
refactor neft patch to be more re-usable similar to trl's impl (#796)
827ec3d
unverified
chore: refactor truthy check and fix mypy (#780)
11d1d60
unverified
Implement fused modules (#747)
15d3a65
unverified
Fix(model): Linear detected and added to target module with rope linear (#738)
440c3ab
unverified
add noisy embedding (#721)
3bd9528
unverified
Maxime
Maxime
commited on
Fix: Higher vram usage for mistral and sample_packing (#691)
669f1d0
unverified
flash_attention + sample packing for stablelm 3b (#671)
2d60ba3
unverified
Fix: ValueError when FA + Mistral when padding_side=right (#681)
eb480df
unverified
Fix(tokenizer): Set rstrip,lstrip,norm to False (#678)
e0b7eea
unverified
chore: Clean up repetitive model kwargs (#670)
e62d590
unverified
Feat: Allow usage of native Mistral FA when no sample_packing (#669)
697c50d
unverified
remove patch fix for phi (#664)
f34648c
unverified
Mistral flash attn packing (#646)
b6ab8aa
unverified
skip some flash attn patches unless explicitly enabled (#643)
895f0a0
unverified
Feat: Add support for upstream FA2 (#626)
19a600a
unverified
misc fixes to add gptq tests (#621)
03e5907
unverified
support to disable exllama for gptq (#604)
faecff9
unverified
Delete duplicate lines (#606)
aa656e0
unverified
btlm and falcon monkey patches for flash attn (#566)
6b9b229
unverified
make phi training work with Loras (#588)
62eaee7
unverified
don't resize embeddings if it's already large enough (#577)
3607882
unverified
Support Sample packing for phi arch (#586)
12a2dbb
unverified
Add training callback to send predictions to WandB table (#521)
5b67ea9
unverified
fix for quant config from model (#540)
a94f9cb
unverified
Add support for GPTQ using native transformers/peft (#468)
3355706
unverified
fix: bad dtype for full finetune (#504)
1991946
unverified
Refactor train cfg cli (#499)
125cccb
unverified
simplify linear layer locator
267b7b2
fsdp requires params be the same type too (#493)
98bf76e
unverified
Fix(tokenizer): Make sure to add pad for CodeLlamaTokenizer (#489)
4c37bd0
unverified
fix condition and add logging
3a011ea
rename var and reformat
f319b0b
Update src/axolotl/utils/models.py
7fd662d
unverified
Update src/axolotl/utils/models.py
9e69968
unverified
ignore: address pr review
d03887f
unverified
Maxime
commited on
ignore: linter
a184549
unverified
Maxime
commited on
fix: finetune model inference needs the dtype fix to work with flash-attn
f311df9
unverified
Maxime
commited on