Improve model card with description, usage examples, and metadata

#3
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +111 -8
README.md CHANGED
@@ -1,4 +1,6 @@
1
  ---
 
 
2
  datasets:
3
  - yolay/RAIF-ComplexInstruction-Qwen
4
  language:
@@ -6,17 +8,78 @@ language:
6
  - zh
7
  library_name: transformers
8
  pipeline_tag: text-generation
9
- base_model:
10
- - Qwen/Qwen2.5-7B-Instruct
 
11
  ---
12
 
13
- This model belongs to the official implementation of the paper [Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models](https://huggingface.co/papers/2506.01413).
 
 
 
 
 
 
 
 
14
 
15
- Existing large language models (LLMs) face challenges in following complex instructions, especially when multiple constraints are present and organized in paralleling, chaining, and branching structures. One intuitive solution, namely chain-of-thought (CoT), is expected to universally improve the capabilities of LLMs. However, we find that the vanilla CoT exerts a negative impact on performance due to its superficial reasoning pattern of simply paraphrasing the instructions. It fails to peel back the compositions of constraints for identifying their relationship across hierarchies of types and dimensions.
16
 
17
- To this end, we propose a systematic method to boost LLMs in dealing with complex instructions via incentivizing reasoning for test-time compute scaling. First, we stem from the decomposition of complex instructions under existing taxonomies and propose a reproducible data acquisition method. Second, we exploit reinforcement learning (RL) with verifiable rule-centric reward signals to cultivate reasoning specifically for instruction following. We address the shallow, non-essential nature of reasoning under complex instructions via sample-wise contrast for superior CoT enforcement. We also exploit behavior cloning of experts to facilitate steady distribution shift from fast-thinking LLMs to skillful reasoners. Extensive evaluations on seven comprehensive benchmarks confirm the validity of the proposed method, where a 1.5B LLM achieves 11.74% gains with performance comparable to an 8B LLM.
 
 
18
 
19
- The model Qwen2.5-7B is our optimized model for its advanced instruction-following capabilities under complex instructions. It corresponds to the **Qwen2.5-7B-Instruct (Ours)** in Table 1.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
  **Table 1** Performance on seven instruction benchmarks. Best/2nd best are marked **bold**/<u>underlined</u>.
22
 
@@ -52,10 +115,50 @@ The model Qwen2.5-7B is our optimized model for its advanced instruction-followi
52
  | DeepSeek-Qwen7B | SFT | 67.09 | 69.10 | 58.66 | 58.42 | 55.60 | 65.96 | 79.15 | 64.85 (-0.88%) |
53
  | DeepSeek-Qwen7B | Ours | 71.35 | 71.40 | 58.67 | 62.04 | 59.65 | 59.38 | 82.00 | 66.35 (+0.62%) |
54
 
55
- [Github repository](https://github.com/yuleiqin/RAIF)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
 
 
57
 
58
- 🎓 If you find this work useful, please consider the following citation:
59
  ```
60
  @article{qin2025incentivizingreasoningadvancedinstructionfollowing,
61
  title={Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models},
 
1
  ---
2
+ base_model:
3
+ - Qwen/Qwen2.5-7B-Instruct
4
  datasets:
5
  - yolay/RAIF-ComplexInstruction-Qwen
6
  language:
 
8
  - zh
9
  library_name: transformers
10
  pipeline_tag: text-generation
11
+ license: apache-2.0
12
+ metrics:
13
+ - accuracy
14
  ---
15
 
16
+ # Qwen2.5-7B-Instruct (RAIF)
17
+
18
+ This model, **Qwen2.5-7B-Instruct (RAIF)**, is an optimized version of the Qwen2.5-7B-Instruct model, specifically enhanced for advanced instruction-following capabilities under complex instructions. It is the official implementation of the paper [Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models](https://huggingface.co/papers/2506.01413).
19
+
20
+ Existing large language models (LLMs) often struggle with complex instructions involving multiple constraints and structured relationships. This work introduces **RAIF**, a systematic method that boosts LLMs' ability to handle such instructions by incentivizing deeper reasoning. It employs a novel data acquisition method based on instruction decomposition and leverages reinforcement learning with verifiable rule-centric reward signals. This approach cultivates authentic reasoning patterns, moving beyond superficial paraphrasing, and facilitates a steady shift towards skillful reasoning, as evidenced by extensive evaluations.
21
+
22
+ The model **Qwen2.5-7B-Instruct (RAIF)** corresponds to the **Qwen2.5-7B-Instruct (Ours)** in Table 1 of the paper.
23
+
24
+ ## Usage
25
 
26
+ You can use this model with the `transformers` library for various text generation tasks, including general chat completion and code generation.
27
 
28
+ ```python
29
+ from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
30
+ import torch
31
 
32
+ model_name = "yolay/RAIF-Qwen2.5-7B-Instruct"
33
+
34
+ # Load model and tokenizer
35
+ model = AutoModelForCausalLM.from_pretrained(
36
+ model_name,
37
+ torch_dtype=torch.bfloat16, # Use torch.float16 if bfloat16 is not supported
38
+ device_map="auto",
39
+ trust_remote_code=True,
40
+ )
41
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
42
+
43
+ pipe = pipeline(
44
+ "text-generation",
45
+ model=model,
46
+ tokenizer=tokenizer,
47
+ )
48
+
49
+ # Example: Text Generation
50
+ print("--- Text Generation ---")
51
+ text_input = "The key to life is"
52
+ outputs_text = pipe(text_input, max_new_tokens=20, do_sample=True, temperature=0.7, top_p=0.8)
53
+ print(outputs_text[0]["generated_text"])
54
+
55
+ print("
56
+ --- Code Generation ---")
57
+ # Example: Code Generation
58
+ code_input = '''
59
+ def print_len(x: str):
60
+ """For a given string x, print the length of x."""
61
+ '''
62
+ outputs_code = pipe(code_input, max_new_tokens=10, do_sample=False)
63
+ print(outputs_code[0]["generated_text"].split("
64
+
65
+ ")[0]) # Get only the code part
66
+
67
+ print("
68
+ --- Chat Completion ---")
69
+ # Example: Chat Completion
70
+ messages = [
71
+ {"role": "user", "content": "Hi! How are you?"},
72
+ ]
73
+ chat_input = tokenizer.apply_chat_template(
74
+ messages,
75
+ add_generation_prompt=True,
76
+ tokenize=False,
77
+ )
78
+ outputs_chat = pipe(chat_input, max_new_tokens=30, do_sample=True, temperature=0.7, top_p=0.95)
79
+ print(outputs_chat[0]["generated_text"])
80
+ ```
81
+
82
+ ## Performance
83
 
84
  **Table 1** Performance on seven instruction benchmarks. Best/2nd best are marked **bold**/<u>underlined</u>.
85
 
 
115
  | DeepSeek-Qwen7B | SFT | 67.09 | 69.10 | 58.66 | 58.42 | 55.60 | 65.96 | 79.15 | 64.85 (-0.88%) |
116
  | DeepSeek-Qwen7B | Ours | 71.35 | 71.40 | 58.67 | 62.04 | 59.65 | 59.38 | 82.00 | 66.35 (+0.62%) |
117
 
118
+ ---
119
+
120
+ **Table 2** Performance on ComplexBench (Qwen2.5-7B-Instruct). Best/2nd best are marked **bold**/<u>underlined</u>. OD, SC, CNFR, FC, and SR stand for Oracle Decomposition, Self-Consistency, Conifer, FollowComplex, and Self-Refine.
121
+
122
+ | Category | ND | I/O | OD | SC | CNFR | FC | SR | Ours |
123
+ |------------------|------|--------|--------|--------|--------|--------|--------|---------|
124
+ | And | 1 | __85.85__ | 84.27 | 84.03 | 75.10 | 84.77 | 85.66 | **86.57** |
125
+ | **Chain** | | | | | | | | |
126
+ | | 1 | 72.18| __74.68__ | 73.54 | 60.95 | 66.27 | **75.25** | 73.96 |
127
+ | | 2 | 70.56| 72.70 | 69.63 | 64.43 | 70.66 | __73.07__ | **76.88** |
128
+ | *Avg.* | - | 70.96 | 73.18 | 70.57 | 63.59 | 69.60 | __73.59__ | **76.18** |
129
+ | **Selection** | | | | | | | | |
130
+ | | 1 | **77.25** | __76.61__ | 72.08 | 60.52 | 71.67 | 69.61 | 73.39 |
131
+ | | 2 | 65.61| __71.83__ | 68.23 | 53.25 | 61.96 | 64.34 | **72.92** |
132
+ | | 3 | __63.39__ | **68.45** | 56.13 | 46.04 | 51.70 | 58.67 | 60.75 |
133
+ | *Avg.* | - | 65.67 | **70.49** | 65.83 | 51.92 | 60.92 | 62.69 | __69.16__ |
134
+ | **Selection & Chain** | | | | | | | | |
135
+ | | 2 | __65.64__ | **65.94** | 60.81 | 47.33 | 61.07 | 52.01 | 61.06 |
136
+ | | 3 | 59.70| **65.77** | 64.08 | 48.53 | 57.65 | 60.41 | __65.00__ |
137
+ | *Avg.* | - | 62.68 | **65.85** | 62.44 | 47.93 | 59.36 | 56.20 | __63.03__ |
138
+ | **Overall** | - | 74.47 | __76.26__ | 73.76 | 63.51 | 71.97 | 74.00 | **77.40** |
139
+
140
+ ---
141
+
142
+ ## Code & Project Page
143
+
144
+ The official implementation, including codes and data, is available at the [RAIF GitHub repository](https://github.com/yuleiqin/RAIF). This repository serves as both the code and project page for this work.
145
+
146
+ ## Acknowledgement🫡
147
+
148
+ In this project, we follow the SimpleRL and the OpenRLHF framework to prepare the codebase. We acknowledge their great work for open-sourcing the implementations of reinforcement learning algorithms.
149
+ * [[SimpleRL](https://github.com/hkust-nlp/simpleRL-reason/)]
150
+ * [[OpenRLHF](https://github.com/OpenRLHF/OpenRLHF)]
151
+
152
+ We also would like to express gratitude to the research community that organize the existing benchmarks for validating the LLMs of solving complex instructions.
153
+
154
+ ## License🪪
155
+
156
+ This project is licensed under the Apache License 2.0. Please refer to the `License_RAIF` file in the GitHub repository for detailed information.
157
+
158
+ ## Citation🎓
159
 
160
+ If you find this work useful, please consider the following citation:
161
 
 
162
  ```
163
  @article{qin2025incentivizingreasoningadvancedinstructionfollowing,
164
  title={Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models},