RDson
/

LIMO-R1-Distill-Qwen-7B

Model card Files Files and versions Community

RDson commited on Feb 17

Commit

319611e

·

1 Parent(s): fde772f

update files

Files changed (1) hide show

README.md +116 -0

README.md ADDED Viewed

	@@ -0,0 +1,116 @@

+---
+license: mit
+datasets:
+- GAIR/LIMO
+language:
+- en
+base_model:
+- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
+tags:
+- R1
+- DeepSeek
+- Distill
+- Qwen
+- 7B
+- LIMO
+---
+# LIMO-R1-Distill-Qwen-7B
+Using [deepseek-ai/DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) as base model.
+Fine-tuned on [GAIR/LIMO](https://huggingface.co/GAIR/LIMO).
+Trained using LLaMA-Factory with the config:
+```
+max_seq_length = 6*1024
+lora_rank = 32
+lora_alpha = lora_rank * 2
+lora_target = ["q_proj", "k_proj", "v_proj", "o_proj",
+        "gate_proj", "up_proj", "down_proj"]
+args = dict(
+  stage="sft",
+  do_train=True,
+  model_name_or_path="unsloth/DeepSeek-R1-Distill-Qwen-7B-bnb-4bit",
+  dataset="limo_restructured",
+  template="custom_template",
+  finetuning_type="lora",
+  lora_target=lora_target,
+  output_dir="qwen_distill_7b_lora",
+  per_device_train_batch_size=1,
+  gradient_accumulation_steps=3,
+  lr_scheduler_type="cosine",
+  logging_steps=1,
+  warmup_ratio=0.1,
+  save_steps=100,
+  learning_rate=1e-4,
+  num_train_epochs=1.0,
+  max_grad_norm=1.0,
+  loraplus_lr_ratio=16.0,
+  fp16=True,
+  report_to="none",
+  preprocessing_num_workers=16,
+  cutoff_len=max_seq_length,
+)
+```
+System used:
+```
+'You are a helpful assistant. Please reason step by step inside the tags <think> and </think>. Conclude with **Answer** and put your final answer within \\boxed{}.'
+```
+Custom template used in training:
+```
+register_template(
+    name="custom_template",
+    format_user=StringFormatter(
+        slots=["<｜User｜>{{content}}"]
+    ),
+    format_assistant=StringFormatter(
+        slots=["<｜Assistant｜>{{content}}<｜end▁of▁sentence｜>"]
+    ),
+    format_system=StringFormatter(
+        slots=["{{content}}"]
+    ),
+    format_function=FunctionFormatter(
+        slots=[
+            "<｜Assistant｜><｜tool▁calls▁begin｜><｜tool▁call▁begin｜>{{type}}<｜tool▁sep｜>{{name}}\n```json\n{{arguments}}\n```<｜tool▁call▁end｜><｜tool▁calls▁end｜><｜end▁of▁sentence｜>"
+        ],
+        tool_format="qwen"
+    ),
+    format_observation=StringFormatter(
+        slots=[
+            "<｜tool▁outputs▁begin｜><｜tool▁output_begin｜>{{content}}<｜tool▁output▁end｜><｜tool▁outputs▁end｜>"
+        ]
+    ),
+    format_tools=ToolFormatter(tool_format="qwen"),
+    default_system="",
+    stop_words=["<｜end▁of▁sentence｜>"]
+)
+```
+In the dataset for variation, I randomly replaced the start of the string "Okay," with one of the following:
+```
+starts = [
+    "Alright,",
+    "Well,",
+    "So,",
+    "Hmm,",
+    "Okay then,",
+    "Right,",
+    "Let's see,",
+    "Now,",
+    "Alrighty,",
+    "Thinking about it,",
+    "You know,",
+    "Well then,",
+    "Come to think of it,",
+    "Actually,",
+    "Now that I think about it,",
+    "Good question,",
+    "Let me think,",
+    "Let's see now,",
+    "Interesting,",
+    "Now then,"
+]
+```