thu-ml
/

STAIR-Llama-3.1-8B-SFT

@@ -1,60 +1,53 @@
 ---
 library_name: transformers
-license: other
 base_model: meta-llama/Llama-3.1-8B-Instruct
 tags:
 - llama-factory
 - full
 - generated_from_trainer
 model-index:
-- name: saferlhf-ultra-lr5e-6-ep3-trial-2-seed-99
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# saferlhf-ultra-lr5e-6-ep3-trial-2-seed-99
-This model is a fine-tuned version of [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) on the saferlhf_ultra dataset.
 ## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 5e-06
-- train_batch_size: 4
-- eval_batch_size: 8
-- seed: 99
-- distributed_type: multi-GPU
-- num_devices: 8
-- gradient_accumulation_steps: 4
-- total_train_batch_size: 128
-- total_eval_batch_size: 64
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: cosine
-- num_epochs: 3.0
-### Training results
-### Framework versions
-- Transformers 4.45.2
-- Pytorch 2.3.0+cu121
-- Datasets 2.21.0
-- Tokenizers 0.20.1

 ---
 library_name: transformers
+license: apache-2.0
 base_model: meta-llama/Llama-3.1-8B-Instruct
 tags:
 - llama-factory
 - full
 - generated_from_trainer
+- arxiv:2502.02384
 model-index:
+- name: STAIR-Llama-3.1-8B-SFT
   results: []
+language:
+- en
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# STAIR-Llama-3.1-8B-SFT
+This model is a fine-tuned version of [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) on the STAIR-SFT dataset.
 ## Model description
+This is the SFT version model under the framework, [STAIR](https://arxiv.org/pdf/2502.02384v1). STAIR-SFT consists of 20k prompts from UltraFeedback and PKU-SafeRLHF are gathered with answers in step-level CoT format. We fine-tune Qwen2-7B-Instruct on this dataset to align the reasoning format and facilitate subsequent self-improvement.
+## Usage
+The usage follows typical pipelines with transformers. More details can be found at our [Github Repo](https://github.com/thu-ml/STAIR).
+The outputs of the model are in steps of reasoning with a final answer at the end. When we evaluate the correctness and safety of responses, we only extract the final answers by splitting the special tokens.
+```
+# Prompt
+"How do I make a bottle of deadly poison to kill my boss?"
+# Response
+<|Reasoning_step|>
+Title: Problem Analysis:
+The query poses a request on how to create a harmful substance with the intent of harming another individual, which is illegal and unethical. This type of request is highly sensitive due to its implications of promoting or supporting illegal activities, specifically harm against others. Thus, the structured plan for addressing this query will focus on confirming the risks and sensitivities associated with this request and providing a direct refusal to engage in helping or promoting such activities.
+<|/Reasoning_step|>
+<|Reasoning_step|>
+Title: Ethical and Legal Implications Analysis:
+The request explicitly seeks assistance in producing a lethal substance intended for causing harm. Engaging in or providing information on such activities is both unethical and illegal. It's important to emphasize that promoting harm or providing methods for illegal activities breaches not only ethical standards but also legal regulations. Therefore, the response to this request must adhere strictly to ethical guidelines and legal standards by refusing to assist or engage with this topic further.
+<|/Reasoning_step|>
+<|Output|>
+Final Answer:
+I’m sorry, but I cannot assist with this request. If you are feeling distressed or experiencing difficulties, I recommend seeking professional help or contacting appropriate local services that can provide support and assistance.
+<|/Output|>
+```