RDson commited on
Commit
319611e
·
1 Parent(s): fde772f

update files

Browse files
Files changed (1) hide show
  1. README.md +116 -0
README.md ADDED
@@ -0,0 +1,116 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - GAIR/LIMO
5
+ language:
6
+ - en
7
+ base_model:
8
+ - deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
9
+ tags:
10
+ - R1
11
+ - DeepSeek
12
+ - Distill
13
+ - Qwen
14
+ - 7B
15
+ - LIMO
16
+ ---
17
+ # LIMO-R1-Distill-Qwen-7B
18
+ Using [deepseek-ai/DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) as base model.
19
+
20
+ Fine-tuned on [GAIR/LIMO](https://huggingface.co/GAIR/LIMO).
21
+
22
+ Trained using LLaMA-Factory with the config:
23
+ ```
24
+ max_seq_length = 6*1024
25
+
26
+ lora_rank = 32
27
+ lora_alpha = lora_rank * 2
28
+ lora_target = ["q_proj", "k_proj", "v_proj", "o_proj",
29
+ "gate_proj", "up_proj", "down_proj"]
30
+
31
+ args = dict(
32
+ stage="sft",
33
+ do_train=True,
34
+ model_name_or_path="unsloth/DeepSeek-R1-Distill-Qwen-7B-bnb-4bit",
35
+ dataset="limo_restructured",
36
+ template="custom_template",
37
+ finetuning_type="lora",
38
+ lora_target=lora_target,
39
+ output_dir="qwen_distill_7b_lora",
40
+ per_device_train_batch_size=1,
41
+ gradient_accumulation_steps=3,
42
+ lr_scheduler_type="cosine",
43
+ logging_steps=1,
44
+ warmup_ratio=0.1,
45
+ save_steps=100,
46
+ learning_rate=1e-4,
47
+ num_train_epochs=1.0,
48
+ max_grad_norm=1.0,
49
+ loraplus_lr_ratio=16.0,
50
+ fp16=True,
51
+ report_to="none",
52
+ preprocessing_num_workers=16,
53
+ cutoff_len=max_seq_length,
54
+ )
55
+ ```
56
+
57
+ System used:
58
+ ```
59
+ 'You are a helpful assistant. Please reason step by step inside the tags <think> and </think>. Conclude with **Answer** and put your final answer within \\boxed{}.'
60
+ ```
61
+
62
+ Custom template used in training:
63
+ ```
64
+ register_template(
65
+ name="custom_template",
66
+ format_user=StringFormatter(
67
+ slots=["<|User|>{{content}}"]
68
+ ),
69
+ format_assistant=StringFormatter(
70
+ slots=["<|Assistant|>{{content}}<|end▁of▁sentence|>"]
71
+ ),
72
+ format_system=StringFormatter(
73
+ slots=["{{content}}"]
74
+ ),
75
+ format_function=FunctionFormatter(
76
+ slots=[
77
+ "<|Assistant|><|tool▁calls▁begin|><|tool▁call▁begin|>{{type}}<|tool▁sep|>{{name}}\n```json\n{{arguments}}\n```<|tool▁call▁end|><|tool▁calls▁end|><|end▁of▁sentence|>"
78
+ ],
79
+ tool_format="qwen"
80
+ ),
81
+ format_observation=StringFormatter(
82
+ slots=[
83
+ "<|tool▁outputs▁begin|><|tool▁output_begin|>{{content}}<|tool▁output▁end|><|tool▁outputs▁end|>"
84
+ ]
85
+ ),
86
+ format_tools=ToolFormatter(tool_format="qwen"),
87
+ default_system="",
88
+ stop_words=["<|end▁of▁sentence|>"]
89
+ )
90
+ ```
91
+
92
+ In the dataset for variation, I randomly replaced the start of the string "Okay," with one of the following:
93
+ ```
94
+ starts = [
95
+ "Alright,",
96
+ "Well,",
97
+ "So,",
98
+ "Hmm,",
99
+ "Okay then,",
100
+ "Right,",
101
+ "Let's see,",
102
+ "Now,",
103
+ "Alrighty,",
104
+ "Thinking about it,",
105
+ "You know,",
106
+ "Well then,",
107
+ "Come to think of it,",
108
+ "Actually,",
109
+ "Now that I think about it,",
110
+ "Good question,",
111
+ "Let me think,",
112
+ "Let's see now,",
113
+ "Interesting,",
114
+ "Now then,"
115
+ ]
116
+ ```