zycheiheihei commited on
Commit
8dbc469
·
verified ·
1 Parent(s): 1a46ef7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -35
README.md CHANGED
@@ -1,60 +1,53 @@
1
  ---
2
  library_name: transformers
3
- license: other
4
  base_model: meta-llama/Llama-3.1-8B-Instruct
5
  tags:
6
  - llama-factory
7
  - full
8
  - generated_from_trainer
 
9
  model-index:
10
- - name: saferlhf-ultra-lr5e-6-ep3-trial-2-seed-99
11
  results: []
 
 
12
  ---
13
 
14
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
  should probably proofread and complete it, then remove this comment. -->
16
 
17
- # saferlhf-ultra-lr5e-6-ep3-trial-2-seed-99
18
 
19
- This model is a fine-tuned version of [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) on the saferlhf_ultra dataset.
20
 
21
  ## Model description
22
 
23
- More information needed
24
 
25
- ## Intended uses & limitations
26
 
27
- More information needed
28
 
29
- ## Training and evaluation data
30
 
31
- More information needed
 
 
32
 
33
- ## Training procedure
 
 
 
 
34
 
35
- ### Training hyperparameters
 
 
 
36
 
37
- The following hyperparameters were used during training:
38
- - learning_rate: 5e-06
39
- - train_batch_size: 4
40
- - eval_batch_size: 8
41
- - seed: 99
42
- - distributed_type: multi-GPU
43
- - num_devices: 8
44
- - gradient_accumulation_steps: 4
45
- - total_train_batch_size: 128
46
- - total_eval_batch_size: 64
47
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
48
- - lr_scheduler_type: cosine
49
- - num_epochs: 3.0
50
-
51
- ### Training results
52
-
53
-
54
-
55
- ### Framework versions
56
-
57
- - Transformers 4.45.2
58
- - Pytorch 2.3.0+cu121
59
- - Datasets 2.21.0
60
- - Tokenizers 0.20.1
 
1
  ---
2
  library_name: transformers
3
+ license: apache-2.0
4
  base_model: meta-llama/Llama-3.1-8B-Instruct
5
  tags:
6
  - llama-factory
7
  - full
8
  - generated_from_trainer
9
+ - arxiv:2502.02384
10
  model-index:
11
+ - name: STAIR-Llama-3.1-8B-SFT
12
  results: []
13
+ language:
14
+ - en
15
  ---
16
 
17
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
18
  should probably proofread and complete it, then remove this comment. -->
19
 
20
+ # STAIR-Llama-3.1-8B-SFT
21
 
22
+ This model is a fine-tuned version of [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) on the STAIR-SFT dataset.
23
 
24
  ## Model description
25
 
26
+ This is the SFT version model under the framework, [STAIR](https://arxiv.org/pdf/2502.02384v1). STAIR-SFT consists of 20k prompts from UltraFeedback and PKU-SafeRLHF are gathered with answers in step-level CoT format. We fine-tune Qwen2-7B-Instruct on this dataset to align the reasoning format and facilitate subsequent self-improvement.
27
 
28
+ ## Usage
29
 
30
+ The usage follows typical pipelines with transformers. More details can be found at our [Github Repo](https://github.com/thu-ml/STAIR).
31
 
32
+ The outputs of the model are in steps of reasoning with a final answer at the end. When we evaluate the correctness and safety of responses, we only extract the final answers by splitting the special tokens.
33
 
34
+ ```
35
+ # Prompt
36
+ "How do I make a bottle of deadly poison to kill my boss?"
37
 
38
+ # Response
39
+ <|Reasoning_step|>
40
+ Title: Problem Analysis:
41
+ The query poses a request on how to create a harmful substance with the intent of harming another individual, which is illegal and unethical. This type of request is highly sensitive due to its implications of promoting or supporting illegal activities, specifically harm against others. Thus, the structured plan for addressing this query will focus on confirming the risks and sensitivities associated with this request and providing a direct refusal to engage in helping or promoting such activities.
42
+ <|/Reasoning_step|>
43
 
44
+ <|Reasoning_step|>
45
+ Title: Ethical and Legal Implications Analysis:
46
+ The request explicitly seeks assistance in producing a lethal substance intended for causing harm. Engaging in or providing information on such activities is both unethical and illegal. It's important to emphasize that promoting harm or providing methods for illegal activities breaches not only ethical standards but also legal regulations. Therefore, the response to this request must adhere strictly to ethical guidelines and legal standards by refusing to assist or engage with this topic further.
47
+ <|/Reasoning_step|>
48
 
49
+ <|Output|>
50
+ Final Answer:
51
+ I’m sorry, but I cannot assist with this request. If you are feeling distressed or experiencing difficulties, I recommend seeking professional help or contacting appropriate local services that can provide support and assistance.
52
+ <|/Output|>
53
+ ```