Triangle104 commited on
Commit
999dbfd
·
verified ·
1 Parent(s): f850b7a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +63 -0
README.md CHANGED
@@ -24,6 +24,69 @@ tags:
24
  This model was converted to GGUF format from [`nbeerbower/Dumpling-Qwen2.5-1.5B-v2`](https://huggingface.co/nbeerbower/Dumpling-Qwen2.5-1.5B-v2) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
25
  Refer to the [original model card](https://huggingface.co/nbeerbower/Dumpling-Qwen2.5-1.5B-v2) for more details on the model.
26
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
  ## Use with llama.cpp
28
  Install llama.cpp through brew (works on Mac and Linux)
29
 
 
24
  This model was converted to GGUF format from [`nbeerbower/Dumpling-Qwen2.5-1.5B-v2`](https://huggingface.co/nbeerbower/Dumpling-Qwen2.5-1.5B-v2) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
25
  Refer to the [original model card](https://huggingface.co/nbeerbower/Dumpling-Qwen2.5-1.5B-v2) for more details on the model.
26
 
27
+ ---
28
+ nbeerbower/EVA-abliterated-TIES-Qwen2.5-1.5B finetuned on:
29
+
30
+ nbeerbower/GreatFirewall-DPO
31
+ nbeerbower/Schule-DPO
32
+ nbeerbower/Purpura-DPO
33
+ nbeerbower/Arkhaios-DPO
34
+ jondurbin/truthy-dpo-v0.1
35
+ antiven0m/physical-reasoning-dpo
36
+ flammenai/Date-DPO-NoAsterisks
37
+ flammenai/Prude-Phi3-DPO
38
+ Atsunori/HelpSteer2-DPO (1,000 samples)
39
+ jondurbin/gutenberg-dpo-v0.1
40
+ nbeerbower/gutenberg2-dpo
41
+ nbeerbower/gutenberg-moderne-dpo.
42
+
43
+ Method
44
+
45
+ QLoRA ORPO tune with 2x RTX 3090 for 2 epochs.
46
+
47
+ # QLoRA config
48
+ bnb_config = BitsAndBytesConfig(
49
+ load_in_4bit=True,
50
+ bnb_4bit_quant_type="nf4",
51
+ bnb_4bit_compute_dtype=torch_dtype,
52
+ bnb_4bit_use_double_quant=True,
53
+ )
54
+
55
+ # LoRA config
56
+ peft_config = LoraConfig(
57
+ r=64,
58
+ lora_alpha=64,
59
+ lora_dropout=0.05,
60
+ bias="none",
61
+ task_type="CAUSAL_LM",
62
+ target_modules=['up_proj', 'down_proj', 'gate_proj', 'k_proj', 'q_proj', 'v_proj', 'o_proj']
63
+ )
64
+
65
+ # Training config
66
+ orpo_args = ORPOConfig(
67
+ run_name=new_model,
68
+ learning_rate=2e-5,
69
+ lr_scheduler_type="linear",
70
+ max_length=2048,
71
+ max_prompt_length=1024,
72
+ max_completion_length=1024,
73
+ beta=0.1,
74
+ per_device_train_batch_size=1,
75
+ per_device_eval_batch_size=1,
76
+ gradient_accumulation_steps=8,
77
+ optim="paged_adamw_8bit",
78
+ num_train_epochs=2,
79
+ evaluation_strategy="steps",
80
+ eval_steps=0.2,
81
+ logging_steps=1,
82
+ warmup_steps=10,
83
+ max_grad_norm=10,
84
+ report_to="wandb",
85
+ output_dir="./results/",
86
+ bf16=True,
87
+ )
88
+
89
+ ---
90
  ## Use with llama.cpp
91
  Install llama.cpp through brew (works on Mac and Linux)
92