yolay
/

RAIF-Qwen2.5-7B

@@ -1,4 +1,6 @@
 ---
 datasets:
 - yolay/RAIF-ComplexInstruction-Qwen
 language:
@@ -6,17 +8,78 @@ language:
 - zh
 library_name: transformers
 pipeline_tag: text-generation
-base_model:
-- Qwen/Qwen2.5-7B-Instruct
 ---
-This model belongs to the official implementation of the paper [Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models](https://huggingface.co/papers/2506.01413).
-Existing large language models (LLMs) face challenges in following complex instructions, especially when multiple constraints are present and organized in paralleling, chaining, and branching structures. One intuitive solution, namely chain-of-thought (CoT), is expected to universally improve the capabilities of LLMs. However, we find that the vanilla CoT exerts a negative impact on performance due to its superficial reasoning pattern of simply paraphrasing the instructions. It fails to peel back the compositions of constraints for identifying their relationship across hierarchies of types and dimensions.
-To this end, we propose a systematic method to boost LLMs in dealing with complex instructions via incentivizing reasoning for test-time compute scaling. First, we stem from the decomposition of complex instructions under existing taxonomies and propose a reproducible data acquisition method. Second, we exploit reinforcement learning (RL) with verifiable rule-centric reward signals to cultivate reasoning specifically for instruction following. We address the shallow, non-essential nature of reasoning under complex instructions via sample-wise contrast for superior CoT enforcement. We also exploit behavior cloning of experts to facilitate steady distribution shift from fast-thinking LLMs to skillful reasoners. Extensive evaluations on seven comprehensive benchmarks confirm the validity of the proposed method, where a 1.5B LLM achieves 11.74% gains with performance comparable to an 8B LLM.
-The model Qwen2.5-7B is our optimized model for its advanced instruction-following capabilities under complex instructions. It corresponds to the **Qwen2.5-7B-Instruct (Ours)** in Table 1.
 **Table 1** Performance on seven instruction benchmarks. Best/2nd best are marked **bold**/<u>underlined</u>.
@@ -52,10 +115,50 @@ The model Qwen2.5-7B is our optimized model for its advanced instruction-followi
 | DeepSeek-Qwen7B        | SFT      | 67.09  | 69.10 | 58.66    | 58.42        | 55.60    | 65.96        | 79.15      | 64.85 (-0.88%)  |
 | DeepSeek-Qwen7B        | Ours     | 71.35  | 71.40 | 58.67    | 62.04        | 59.65    | 59.38        | 82.00      | 66.35 (+0.62%) |
-[Github repository](https://github.com/yuleiqin/RAIF)
-🎓 If you find this work useful, please consider the following citation:
 ```
 @article{qin2025incentivizingreasoningadvancedinstructionfollowing,
       title={Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models},

 ---
+base_model:
+- Qwen/Qwen2.5-7B-Instruct
 datasets:
 - yolay/RAIF-ComplexInstruction-Qwen
 language:
 - zh
 library_name: transformers
 pipeline_tag: text-generation
+license: apache-2.0
+metrics:
+- accuracy
 ---
+# Qwen2.5-7B-Instruct (RAIF)
+This model, **Qwen2.5-7B-Instruct (RAIF)**, is an optimized version of the Qwen2.5-7B-Instruct model, specifically enhanced for advanced instruction-following capabilities under complex instructions. It is the official implementation of the paper [Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models](https://huggingface.co/papers/2506.01413).
+Existing large language models (LLMs) often struggle with complex instructions involving multiple constraints and structured relationships. This work introduces **RAIF**, a systematic method that boosts LLMs' ability to handle such instructions by incentivizing deeper reasoning. It employs a novel data acquisition method based on instruction decomposition and leverages reinforcement learning with verifiable rule-centric reward signals. This approach cultivates authentic reasoning patterns, moving beyond superficial paraphrasing, and facilitates a steady shift towards skillful reasoning, as evidenced by extensive evaluations.
+The model **Qwen2.5-7B-Instruct (RAIF)** corresponds to the **Qwen2.5-7B-Instruct (Ours)** in Table 1 of the paper.
+## Usage
+You can use this model with the `transformers` library for various text generation tasks, including general chat completion and code generation.
+```python
+from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
+import torch
+model_name = "yolay/RAIF-Qwen2.5-7B-Instruct"
+# Load model and tokenizer
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype=torch.bfloat16, # Use torch.float16 if bfloat16 is not supported
+    device_map="auto",
+    trust_remote_code=True,
+)
+tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
+pipe = pipeline(
+    "text-generation",
+    model=model,
+    tokenizer=tokenizer,
+)
+# Example: Text Generation
+print("--- Text Generation ---")
+text_input = "The key to life is"
+outputs_text = pipe(text_input, max_new_tokens=20, do_sample=True, temperature=0.7, top_p=0.8)
+print(outputs_text[0]["generated_text"])
+print("
+--- Code Generation ---")
+# Example: Code Generation
+code_input = '''
+def print_len(x: str):
+    """For a given string x, print the length of x."""
+'''
+outputs_code = pipe(code_input, max_new_tokens=10, do_sample=False)
+print(outputs_code[0]["generated_text"].split("
+")[0]) # Get only the code part
+print("
+--- Chat Completion ---")
+# Example: Chat Completion
+messages = [
+    {"role": "user", "content": "Hi! How are you?"},
+]
+chat_input = tokenizer.apply_chat_template(
+    messages,
+    add_generation_prompt=True,
+    tokenize=False,
+)
+outputs_chat = pipe(chat_input, max_new_tokens=30, do_sample=True, temperature=0.7, top_p=0.95)
+print(outputs_chat[0]["generated_text"])
+```
+## Performance
 **Table 1** Performance on seven instruction benchmarks. Best/2nd best are marked **bold**/<u>underlined</u>.
 | DeepSeek-Qwen7B        | SFT      | 67.09  | 69.10 | 58.66    | 58.42        | 55.60    | 65.96        | 79.15      | 64.85 (-0.88%)  |
 | DeepSeek-Qwen7B        | Ours     | 71.35  | 71.40 | 58.67    | 62.04        | 59.65    | 59.38        | 82.00      | 66.35 (+0.62%) |
+---
+**Table 2** Performance on ComplexBench (Qwen2.5-7B-Instruct). Best/2nd best are marked **bold**/<u>underlined</u>. OD, SC, CNFR, FC, and SR stand for Oracle Decomposition, Self-Consistency, Conifer, FollowComplex, and Self-Refine.
+| Category         | ND   | I/O    | OD     | SC     | CNFR   | FC     | SR     | Ours    |
+|------------------|------|--------|--------|--------|--------|--------|--------|---------|
+| And              | 1    | __85.85__ | 84.27  | 84.03  | 75.10  | 84.77  | 85.66  | **86.57** |
+| **Chain**        |      |        |        |        |        |        |        |         |
+|       |  1                | 72.18| __74.68__ | 73.54  | 60.95  | 66.27  | **75.25** | 73.96  |
+|      |   2                | 70.56| 72.70  | 69.63  | 64.43  | 70.66  | __73.07__ | **76.88** |
+| *Avg.*           | -    | 70.96  | 73.18  | 70.57  | 63.59  | 69.60  | __73.59__ | **76.18** |
+| **Selection**    |      |        |        |        |        |        |        |         |
+|      |   1                | **77.25** | __76.61__ | 72.08  | 60.52  | 71.67  | 69.61  | 73.39  |
+|      |  2                | 65.61| __71.83__ | 68.23  | 53.25  | 61.96  | 64.34  | **72.92** |
+|      |  3                | __63.39__ | **68.45** | 56.13  | 46.04  | 51.70  | 58.67  | 60.75  |
+|    *Avg.*           | -    | 65.67  | **70.49** | 65.83  | 51.92  | 60.92  | 62.69  | __69.16__ |
+| **Selection & Chain** | |        |        |        |        |        |        |         |
+|      |  2                | __65.64__ | **65.94** | 60.81  | 47.33  | 61.07  | 52.01  | 61.06  |
+|      |  3                | 59.70| **65.77** | 64.08  | 48.53  | 57.65  | 60.41  | __65.00__ |
+|    *Avg.*           | -    | 62.68  | **65.85** | 62.44  | 47.93  | 59.36  | 56.20  | __63.03__ |
+| **Overall**      | -    | 74.47  | __76.26__ | 73.76  | 63.51  | 71.97  | 74.00  | **77.40** |
+---
+## Code & Project Page
+The official implementation, including codes and data, is available at the [RAIF GitHub repository](https://github.com/yuleiqin/RAIF). This repository serves as both the code and project page for this work.
+## Acknowledgement🫡
+In this project, we follow the SimpleRL and the OpenRLHF framework to prepare the codebase. We acknowledge their great work for open-sourcing the implementations of reinforcement learning algorithms.
+*   [[SimpleRL](https://github.com/hkust-nlp/simpleRL-reason/)]
+*   [[OpenRLHF](https://github.com/OpenRLHF/OpenRLHF)]
+We also would like to express gratitude to the research community that organize the existing benchmarks for validating the LLMs of solving complex instructions.
+## License🪪
+This project is licensed under the Apache License 2.0. Please refer to the `License_RAIF` file in the GitHub repository for detailed information.
+## Citation🎓
+If you find this work useful, please consider the following citation:
 ```
 @article{qin2025incentivizingreasoningadvancedinstructionfollowing,
       title={Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models},