Improve model card with description, usage examples, and metadata
#3
by
nielsr
HF Staff
- opened
README.md
CHANGED
@@ -1,4 +1,6 @@
|
|
1 |
---
|
|
|
|
|
2 |
datasets:
|
3 |
- yolay/RAIF-ComplexInstruction-Qwen
|
4 |
language:
|
@@ -6,17 +8,78 @@ language:
|
|
6 |
- zh
|
7 |
library_name: transformers
|
8 |
pipeline_tag: text-generation
|
9 |
-
|
10 |
-
|
|
|
11 |
---
|
12 |
|
13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
14 |
|
15 |
-
|
16 |
|
17 |
-
|
|
|
|
|
18 |
|
19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
|
21 |
**Table 1** Performance on seven instruction benchmarks. Best/2nd best are marked **bold**/<u>underlined</u>.
|
22 |
|
@@ -52,10 +115,50 @@ The model Qwen2.5-7B is our optimized model for its advanced instruction-followi
|
|
52 |
| DeepSeek-Qwen7B | SFT | 67.09 | 69.10 | 58.66 | 58.42 | 55.60 | 65.96 | 79.15 | 64.85 (-0.88%) |
|
53 |
| DeepSeek-Qwen7B | Ours | 71.35 | 71.40 | 58.67 | 62.04 | 59.65 | 59.38 | 82.00 | 66.35 (+0.62%) |
|
54 |
|
55 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
56 |
|
|
|
57 |
|
58 |
-
🎓 If you find this work useful, please consider the following citation:
|
59 |
```
|
60 |
@article{qin2025incentivizingreasoningadvancedinstructionfollowing,
|
61 |
title={Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models},
|
|
|
1 |
---
|
2 |
+
base_model:
|
3 |
+
- Qwen/Qwen2.5-7B-Instruct
|
4 |
datasets:
|
5 |
- yolay/RAIF-ComplexInstruction-Qwen
|
6 |
language:
|
|
|
8 |
- zh
|
9 |
library_name: transformers
|
10 |
pipeline_tag: text-generation
|
11 |
+
license: apache-2.0
|
12 |
+
metrics:
|
13 |
+
- accuracy
|
14 |
---
|
15 |
|
16 |
+
# Qwen2.5-7B-Instruct (RAIF)
|
17 |
+
|
18 |
+
This model, **Qwen2.5-7B-Instruct (RAIF)**, is an optimized version of the Qwen2.5-7B-Instruct model, specifically enhanced for advanced instruction-following capabilities under complex instructions. It is the official implementation of the paper [Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models](https://huggingface.co/papers/2506.01413).
|
19 |
+
|
20 |
+
Existing large language models (LLMs) often struggle with complex instructions involving multiple constraints and structured relationships. This work introduces **RAIF**, a systematic method that boosts LLMs' ability to handle such instructions by incentivizing deeper reasoning. It employs a novel data acquisition method based on instruction decomposition and leverages reinforcement learning with verifiable rule-centric reward signals. This approach cultivates authentic reasoning patterns, moving beyond superficial paraphrasing, and facilitates a steady shift towards skillful reasoning, as evidenced by extensive evaluations.
|
21 |
+
|
22 |
+
The model **Qwen2.5-7B-Instruct (RAIF)** corresponds to the **Qwen2.5-7B-Instruct (Ours)** in Table 1 of the paper.
|
23 |
+
|
24 |
+
## Usage
|
25 |
|
26 |
+
You can use this model with the `transformers` library for various text generation tasks, including general chat completion and code generation.
|
27 |
|
28 |
+
```python
|
29 |
+
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
|
30 |
+
import torch
|
31 |
|
32 |
+
model_name = "yolay/RAIF-Qwen2.5-7B-Instruct"
|
33 |
+
|
34 |
+
# Load model and tokenizer
|
35 |
+
model = AutoModelForCausalLM.from_pretrained(
|
36 |
+
model_name,
|
37 |
+
torch_dtype=torch.bfloat16, # Use torch.float16 if bfloat16 is not supported
|
38 |
+
device_map="auto",
|
39 |
+
trust_remote_code=True,
|
40 |
+
)
|
41 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
|
42 |
+
|
43 |
+
pipe = pipeline(
|
44 |
+
"text-generation",
|
45 |
+
model=model,
|
46 |
+
tokenizer=tokenizer,
|
47 |
+
)
|
48 |
+
|
49 |
+
# Example: Text Generation
|
50 |
+
print("--- Text Generation ---")
|
51 |
+
text_input = "The key to life is"
|
52 |
+
outputs_text = pipe(text_input, max_new_tokens=20, do_sample=True, temperature=0.7, top_p=0.8)
|
53 |
+
print(outputs_text[0]["generated_text"])
|
54 |
+
|
55 |
+
print("
|
56 |
+
--- Code Generation ---")
|
57 |
+
# Example: Code Generation
|
58 |
+
code_input = '''
|
59 |
+
def print_len(x: str):
|
60 |
+
"""For a given string x, print the length of x."""
|
61 |
+
'''
|
62 |
+
outputs_code = pipe(code_input, max_new_tokens=10, do_sample=False)
|
63 |
+
print(outputs_code[0]["generated_text"].split("
|
64 |
+
|
65 |
+
")[0]) # Get only the code part
|
66 |
+
|
67 |
+
print("
|
68 |
+
--- Chat Completion ---")
|
69 |
+
# Example: Chat Completion
|
70 |
+
messages = [
|
71 |
+
{"role": "user", "content": "Hi! How are you?"},
|
72 |
+
]
|
73 |
+
chat_input = tokenizer.apply_chat_template(
|
74 |
+
messages,
|
75 |
+
add_generation_prompt=True,
|
76 |
+
tokenize=False,
|
77 |
+
)
|
78 |
+
outputs_chat = pipe(chat_input, max_new_tokens=30, do_sample=True, temperature=0.7, top_p=0.95)
|
79 |
+
print(outputs_chat[0]["generated_text"])
|
80 |
+
```
|
81 |
+
|
82 |
+
## Performance
|
83 |
|
84 |
**Table 1** Performance on seven instruction benchmarks. Best/2nd best are marked **bold**/<u>underlined</u>.
|
85 |
|
|
|
115 |
| DeepSeek-Qwen7B | SFT | 67.09 | 69.10 | 58.66 | 58.42 | 55.60 | 65.96 | 79.15 | 64.85 (-0.88%) |
|
116 |
| DeepSeek-Qwen7B | Ours | 71.35 | 71.40 | 58.67 | 62.04 | 59.65 | 59.38 | 82.00 | 66.35 (+0.62%) |
|
117 |
|
118 |
+
---
|
119 |
+
|
120 |
+
**Table 2** Performance on ComplexBench (Qwen2.5-7B-Instruct). Best/2nd best are marked **bold**/<u>underlined</u>. OD, SC, CNFR, FC, and SR stand for Oracle Decomposition, Self-Consistency, Conifer, FollowComplex, and Self-Refine.
|
121 |
+
|
122 |
+
| Category | ND | I/O | OD | SC | CNFR | FC | SR | Ours |
|
123 |
+
|------------------|------|--------|--------|--------|--------|--------|--------|---------|
|
124 |
+
| And | 1 | __85.85__ | 84.27 | 84.03 | 75.10 | 84.77 | 85.66 | **86.57** |
|
125 |
+
| **Chain** | | | | | | | | |
|
126 |
+
| | 1 | 72.18| __74.68__ | 73.54 | 60.95 | 66.27 | **75.25** | 73.96 |
|
127 |
+
| | 2 | 70.56| 72.70 | 69.63 | 64.43 | 70.66 | __73.07__ | **76.88** |
|
128 |
+
| *Avg.* | - | 70.96 | 73.18 | 70.57 | 63.59 | 69.60 | __73.59__ | **76.18** |
|
129 |
+
| **Selection** | | | | | | | | |
|
130 |
+
| | 1 | **77.25** | __76.61__ | 72.08 | 60.52 | 71.67 | 69.61 | 73.39 |
|
131 |
+
| | 2 | 65.61| __71.83__ | 68.23 | 53.25 | 61.96 | 64.34 | **72.92** |
|
132 |
+
| | 3 | __63.39__ | **68.45** | 56.13 | 46.04 | 51.70 | 58.67 | 60.75 |
|
133 |
+
| *Avg.* | - | 65.67 | **70.49** | 65.83 | 51.92 | 60.92 | 62.69 | __69.16__ |
|
134 |
+
| **Selection & Chain** | | | | | | | | |
|
135 |
+
| | 2 | __65.64__ | **65.94** | 60.81 | 47.33 | 61.07 | 52.01 | 61.06 |
|
136 |
+
| | 3 | 59.70| **65.77** | 64.08 | 48.53 | 57.65 | 60.41 | __65.00__ |
|
137 |
+
| *Avg.* | - | 62.68 | **65.85** | 62.44 | 47.93 | 59.36 | 56.20 | __63.03__ |
|
138 |
+
| **Overall** | - | 74.47 | __76.26__ | 73.76 | 63.51 | 71.97 | 74.00 | **77.40** |
|
139 |
+
|
140 |
+
---
|
141 |
+
|
142 |
+
## Code & Project Page
|
143 |
+
|
144 |
+
The official implementation, including codes and data, is available at the [RAIF GitHub repository](https://github.com/yuleiqin/RAIF). This repository serves as both the code and project page for this work.
|
145 |
+
|
146 |
+
## Acknowledgement🫡
|
147 |
+
|
148 |
+
In this project, we follow the SimpleRL and the OpenRLHF framework to prepare the codebase. We acknowledge their great work for open-sourcing the implementations of reinforcement learning algorithms.
|
149 |
+
* [[SimpleRL](https://github.com/hkust-nlp/simpleRL-reason/)]
|
150 |
+
* [[OpenRLHF](https://github.com/OpenRLHF/OpenRLHF)]
|
151 |
+
|
152 |
+
We also would like to express gratitude to the research community that organize the existing benchmarks for validating the LLMs of solving complex instructions.
|
153 |
+
|
154 |
+
## License🪪
|
155 |
+
|
156 |
+
This project is licensed under the Apache License 2.0. Please refer to the `License_RAIF` file in the GitHub repository for detailed information.
|
157 |
+
|
158 |
+
## Citation🎓
|
159 |
|
160 |
+
If you find this work useful, please consider the following citation:
|
161 |
|
|
|
162 |
```
|
163 |
@article{qin2025incentivizingreasoningadvancedinstructionfollowing,
|
164 |
title={Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models},
|