Commits · Dovakiins/qwerrwe

fix: torch_dtype mistral default to fp32 (#1050)

c3e8165
unverified

Nanobit commited on Jan 9, 2024

Phi2 rewrite (#1058)

732851f
unverified

winglian commited on Jan 8, 2024

feature: better device mapping for large models (#918)

bdfefaf
unverified

kallewoof Karl-Johan Alm

winglian commited on Jan 5, 2024

RL/DPO (#935)

f243c21

winglian commited on Jan 4, 2024

bump transformers and update attention class map name (#1023)

bcc78d8
unverified

winglian commited on Jan 3, 2024

Adds chat templates (#1022)

f8ae59b
unverified

mhenrichsen commited on Dec 29, 2023

feat: expose bnb kwargs (#1018)

41353d2
unverified

Nanobit

hamel commited on Dec 29, 2023

remove landmark attn and xpos rope implementations (#1010)

70b46ca
unverified

winglian commited on Dec 28, 2023

Feat: Warns to add to modules_to_save when adding tokens or switching special_tokens (#787)

1ffa386
unverified

Nanobit commited on Dec 22, 2023

Fix Deepspeed loading (#950)

5ea3aa3
unverified

winglian commited on Dec 13, 2023

Flash attn hotfix (#951)

f1f60cb
unverified

winglian commited on Dec 13, 2023

Mixtral official (#942)

7fabc4d
unverified

winglian commited on Dec 12, 2023

Mixtral multipack (#928)

68b227a
unverified

winglian commited on Dec 10, 2023

support for mamba (#915)

40a6362
unverified

winglian commited on Dec 9, 2023

fix(tokenizer): handle fast tokenizer properly for bos/eos (#914)

fde091c
unverified

Nanobit commited on Dec 8, 2023

feat: add check for quantized model (#913)

a581e9f
unverified

Nanobit

winglian commited on Dec 4, 2023

Support device_map=sequential & max_memory config parameters (#903)

992e742
unverified

Bryan Thornbury

winglian commited on Dec 4, 2023

fix for qwen w lora (#906)

3e3229e
unverified

winglian commited on Nov 30, 2023

Feat: Add Qwen (#894)

1115c50
unverified

Nanobit commited on Nov 25, 2023

Phi update 202311 (#876)

9bf854e
unverified

winglian commited on Nov 17, 2023

allow overriding of model_config parameters from the YML (#853)

1bc1186
unverified

winglian commited on Nov 16, 2023

fix model parallel (#816)

964d858
unverified

winglian commited on Nov 3, 2023

fix(tokenizer): update log order after update (#806)

10388a8
unverified

Nanobit commited on Oct 31, 2023

fix(config): Set eos/bos to tokenizer if different (#801)

637ed09
unverified

Nanobit commited on Oct 29, 2023

refactor neft patch to be more re-usable similar to trl's impl (#796)

827ec3d
unverified

winglian commited on Oct 29, 2023

chore: refactor truthy check and fix mypy (#780)

11d1d60
unverified

Nanobit commited on Oct 24, 2023

Implement fused modules (#747)

15d3a65
unverified

casperhansen

winglian commited on Oct 21, 2023

Fix(model): Linear detected and added to target module with rope linear (#738)

440c3ab
unverified

Nanobit commited on Oct 19, 2023

add noisy embedding (#721)

3bd9528
unverified

Maxime Maxime commited on Oct 13, 2023

Fix: Higher vram usage for mistral and sample_packing (#691)

669f1d0
unverified

Nanobit commited on Oct 6, 2023

flash_attention + sample packing for stablelm 3b (#671)

2d60ba3
unverified

winglian commited on Oct 5, 2023

Fix: ValueError when FA + Mistral when padding_side=right (#681)

eb480df
unverified

Nanobit commited on Oct 5, 2023

Fix(tokenizer): Set rstrip,lstrip,norm to False (#678)

e0b7eea
unverified

Nanobit commited on Oct 5, 2023

chore: Clean up repetitive model kwargs (#670)

e62d590
unverified

Nanobit commited on Oct 4, 2023

Feat: Allow usage of native Mistral FA when no sample_packing (#669)

697c50d
unverified

Nanobit commited on Oct 4, 2023

remove patch fix for phi (#664)

f34648c
unverified

winglian commited on Oct 3, 2023

Mistral flash attn packing (#646)

b6ab8aa
unverified

winglian commited on Sep 27, 2023

skip some flash attn patches unless explicitly enabled (#643)

895f0a0
unverified

winglian commited on Sep 27, 2023

Feat: Add support for upstream FA2 (#626)

19a600a
unverified

Nanobit commited on Sep 26, 2023

misc fixes to add gptq tests (#621)

03e5907
unverified

winglian commited on Sep 22, 2023

support to disable exllama for gptq (#604)

faecff9
unverified

winglian commited on Sep 19, 2023

Delete duplicate lines (#606)

aa656e0
unverified

bofenghuang commited on Sep 19, 2023

btlm and falcon monkey patches for flash attn (#566)

6b9b229
unverified

winglian commited on Sep 17, 2023

make phi training work with Loras (#588)

62eaee7
unverified

winglian commited on Sep 16, 2023

don't resize embeddings if it's already large enough (#577)

3607882
unverified

winglian commited on Sep 15, 2023

Support Sample packing for phi arch (#586)

12a2dbb
unverified

winglian commited on Sep 15, 2023

Add training callback to send predictions to WandB table (#521)

5b67ea9
unverified

Glavin001 commited on Sep 13, 2023

fix for quant config from model (#540)

a94f9cb
unverified

winglian commited on Sep 10, 2023

Add support for GPTQ using native transformers/peft (#468)

3355706
unverified

winglian commited on Sep 5, 2023

fix: bad dtype for full finetune (#504)

1991946
unverified

Maxime

winglian commited on Sep 1, 2023

Commit History

fix: torch_dtype mistral default to fp32 (#1050) c3e8165 unverified

Phi2 rewrite (#1058) 732851f unverified

feature: better device mapping for large models (#918) bdfefaf unverified

RL/DPO (#935) f243c21

bump transformers and update attention class map name (#1023) bcc78d8 unverified

Adds chat templates (#1022) f8ae59b unverified

feat: expose bnb kwargs (#1018) 41353d2 unverified

remove landmark attn and xpos rope implementations (#1010) 70b46ca unverified

Feat: Warns to add to modules_to_save when adding tokens or switching special_tokens (#787) 1ffa386 unverified

Fix Deepspeed loading (#950) 5ea3aa3 unverified

Flash attn hotfix (#951) f1f60cb unverified

Mixtral official (#942) 7fabc4d unverified

Mixtral multipack (#928) 68b227a unverified

support for mamba (#915) 40a6362 unverified

fix(tokenizer): handle fast tokenizer properly for bos/eos (#914) fde091c unverified

feat: add check for quantized model (#913) a581e9f unverified

Support device_map=sequential & max_memory config parameters (#903) 992e742 unverified

fix for qwen w lora (#906) 3e3229e unverified

Feat: Add Qwen (#894) 1115c50 unverified

Phi update 202311 (#876) 9bf854e unverified

allow overriding of model_config parameters from the YML (#853) 1bc1186 unverified

fix model parallel (#816) 964d858 unverified

fix(tokenizer): update log order after update (#806) 10388a8 unverified

fix(config): Set eos/bos to tokenizer if different (#801) 637ed09 unverified

refactor neft patch to be more re-usable similar to trl's impl (#796) 827ec3d unverified

chore: refactor truthy check and fix mypy (#780) 11d1d60 unverified

Implement fused modules (#747) 15d3a65 unverified

Fix(model): Linear detected and added to target module with rope linear (#738) 440c3ab unverified

add noisy embedding (#721) 3bd9528 unverified

Fix: Higher vram usage for mistral and sample_packing (#691) 669f1d0 unverified

flash_attention + sample packing for stablelm 3b (#671) 2d60ba3 unverified

Fix: ValueError when FA + Mistral when padding_side=right (#681) eb480df unverified

Fix(tokenizer): Set rstrip,lstrip,norm to False (#678) e0b7eea unverified

chore: Clean up repetitive model kwargs (#670) e62d590 unverified

Feat: Allow usage of native Mistral FA when no sample_packing (#669) 697c50d unverified

remove patch fix for phi (#664) f34648c unverified

Mistral flash attn packing (#646) b6ab8aa unverified

skip some flash attn patches unless explicitly enabled (#643) 895f0a0 unverified

Feat: Add support for upstream FA2 (#626) 19a600a unverified

misc fixes to add gptq tests (#621) 03e5907 unverified

support to disable exllama for gptq (#604) faecff9 unverified

Delete duplicate lines (#606) aa656e0 unverified

btlm and falcon monkey patches for flash attn (#566) 6b9b229 unverified

make phi training work with Loras (#588) 62eaee7 unverified

don't resize embeddings if it's already large enough (#577) 3607882 unverified

Support Sample packing for phi arch (#586) 12a2dbb unverified

Add training callback to send predictions to WandB table (#521) 5b67ea9 unverified

fix for quant config from model (#540) a94f9cb unverified

Add support for GPTQ using native transformers/peft (#468) 3355706 unverified

fix: bad dtype for full finetune (#504) 1991946 unverified