Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,133 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
library_name: transformers
|
3 |
+
license: apache-2.0
|
4 |
+
pipeline_tag: text-generation
|
5 |
+
---
|
6 |
+
|
7 |
+
# Rubicon
|
8 |
+
|
9 |
+
<p align="center">
|
10 |
+
<a href="https://arxiv.org/abs/2508.12790"><b>📄 Paper</b></a> •
|
11 |
+
<a href="https://huggingface.co/inclusionAI/Rubicon-Preview"><b>🤗 Model</b></a>
|
12 |
+
</p>
|
13 |
+
|
14 |
+
This is the model card for **Rubicon-preview**, a 30B-A3B parameter model trained with a novel reinforcement learning framework using "rubric anchors" to excel at open-ended, creative, and humanities-centric tasks.
|
15 |
+
|
16 |
+
---
|
17 |
+
|
18 |
+
## Highlights
|
19 |
+
|
20 |
+
We introduce **Rubicon**, a novel framework using rubric anchors for reinforcement learning. Our model, **Rubicon-preview**, demonstrates the following key highlights:
|
21 |
+
|
22 |
+
- **Token-Efficient Performance**: Achieves a **+5.2%** absolute improvement on subjective, humanities-centric tasks with only **5K** training samples, outperforming a 671B DeepSeek-V3 model.
|
23 |
+
- **Stylistic Controllability**: Leverages rubric anchors to precisely guide output style, producing responses that are more human-like, emotionally expressive, and less formulaic.
|
24 |
+
- **Preservation of General Abilities**: Avoids performance degradation on general tasks—a common side effect of specialized RL—while delivering additional gains on reasoning benchmarks like AIME 2024 (+4.1%).
|
25 |
+
|
26 |
+
---
|
27 |
+
|
28 |
+
## Performance
|
29 |
+
|
30 |
+
Our rubric-based RL approach yields significant gains on open-ended, humanities-centric benchmarks while preserving and even enhancing performance on general and reasoning tasks.
|
31 |
+
|
32 |
+
### Humanities & Open-Ended Evaluation
|
33 |
+
|
34 |
+
Rubicon-preview achieves a **+5.21%** average absolute improvement over its base model on a diverse set of subjective benchmarks. Notably, it surpasses the much larger DeepSeek-V3-671B model by **+2.42%** on average.
|
35 |
+
|
36 |
+
| **Model** | **C.W** | **Writing** | **Judge** | **EQ** | **IFE** | **Collie** | **IFS** | **Avg** |
|
37 |
+
|:---|---:|---:|---:|---:|---:|---:|---:|---:|
|
38 |
+
| Qwen3-30B-A3B | 77.82 | 75.65 | 56.20 | 73.35 | **83.55** | 35.77 | 54.68 | 65.29 |
|
39 |
+
| **Rubicon-preview** | **81.89** | **80.11** | **69.20** | **79.55** | 81.70 | 40.27 | 60.79 | **70.50** |
|
40 |
+
| *Δ Improvement* | <span style="color:green">↑4.07</span> | <span style="color:green">↑4.46</span> | <span style="color:green">↑13.00</span> | <span style="color:green">↑6.20</span> | <span style="color:red">↓1.85</span> | <span style="color:green">↑4.50</span> | <span style="color:green">↑6.11</span> | **<span style="color:green">↑5.21</span>** |
|
41 |
+
| DeepSeek-V3-671B | 80.10 | 74.08 | 61.30 | 75.60 | 81.89 | **42.69** | **60.92** | 68.08 |
|
42 |
+
|
43 |
+
### General & Reasoning Abilities
|
44 |
+
|
45 |
+
The model maintains its core capabilities without degradation. It shows notable improvements on math reasoning benchmarks like AIME and enhances performance across several general benchmarks.
|
46 |
+
|
47 |
+
**Reasoning**
|
48 |
+
| **Model** | **AIME24** | **AIME25** | **Math500** | **GPQA-D** | **LCBv5** | **Avg** |
|
49 |
+
|:---|---:|---:|---:|---:|---:|---:|
|
50 |
+
| Qwen3-30B-A3B | 77.50 | 70.00 | **94.75** | **63.00** | **63.77** | **73.80** |
|
51 |
+
| **Rubicon-preview** | **81.67** | **70.83** | 94.55 | 60.35 | 59.43 | 73.37 |
|
52 |
+
|
53 |
+
**General**
|
54 |
+
| **Model** | **MMLU** | **IQ-EQ** | **HS** | **SC** | **CQ** | **SIQA** | **Avg** |
|
55 |
+
|:---|---:|---:|---:|---:|---:|---:|---:|
|
56 |
+
| Qwen3--30B-A3B | 79.53 | 68.75 | 77.55 | 77.72 | 79.52 | 73.64 | 78.16 |
|
57 |
+
| **Rubicon-preview** | **79.83** | **75.00** | **77.75** | **78.17** | **80.70** | **75.79** | **78.85** |
|
58 |
+
|
59 |
+
---
|
60 |
+
|
61 |
+
## Usage
|
62 |
+
|
63 |
+
Below are code snippets to get quickly started with running the model.
|
64 |
+
|
65 |
+
### Installation
|
66 |
+
|
67 |
+
First, install the necessary libraries. We recommend a recent version of Transformers.
|
68 |
+
|
69 |
+
```sh
|
70 |
+
pip install transformers torch
|
71 |
+
```
|
72 |
+
|
73 |
+
### Quick Start with Python
|
74 |
+
|
75 |
+
You can use the model for text generation with just a few lines of code.
|
76 |
+
|
77 |
+
```python
|
78 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
79 |
+
import torch
|
80 |
+
|
81 |
+
model_name = "inclusionAI/Rubicon-Preview"
|
82 |
+
|
83 |
+
# Load the tokenizer and the model
|
84 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
85 |
+
model = AutoModelForCausalLM.from_pretrained(
|
86 |
+
model_name,
|
87 |
+
torch_dtype=torch.bfloat16, # or "auto"
|
88 |
+
device_map="auto"
|
89 |
+
)
|
90 |
+
|
91 |
+
# Prepare the model input using the chat template
|
92 |
+
prompt = "Is there true love in this world?"
|
93 |
+
messages = [
|
94 |
+
{"role": "user", "content": prompt}
|
95 |
+
]
|
96 |
+
|
97 |
+
# Apply the chat template
|
98 |
+
text = tokenizer.apply_chat_template(
|
99 |
+
messages,
|
100 |
+
tokenize=False,
|
101 |
+
add_generation_prompt=True,
|
102 |
+
)
|
103 |
+
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
|
104 |
+
|
105 |
+
# Conduct text completion
|
106 |
+
generated_ids = model.generate(
|
107 |
+
**model_inputs,
|
108 |
+
max_new_tokens=512,
|
109 |
+
do_sample=True,
|
110 |
+
temperature=0.6,
|
111 |
+
top_p=0.95,
|
112 |
+
)
|
113 |
+
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):]
|
114 |
+
content = tokenizer.decode(output_ids, skip_special_tokens=True)
|
115 |
+
|
116 |
+
print("Generated Response:\n", content)
|
117 |
+
```
|
118 |
+
|
119 |
+
---
|
120 |
+
|
121 |
+
## Citation
|
122 |
+
|
123 |
+
If you use Rubicon in your research, please cite our paper:
|
124 |
+
|
125 |
+
```bibtex
|
126 |
+
@article{Rubicon,
|
127 |
+
title = {Reinforcement Learning with Rubric Anchors},
|
128 |
+
author = {Huang, Zenan and Zhuang, Yihong and Lu, Guoshan and Qin, Zeyu and Xu, Haokai and Zhao, Tianyu and Peng, Ru and Hu, Jiaqi and Shen, Zhanming and Hu, Xiaomeng and Gu, Xijun and Tu, Peiyi and Liu, Jiaxin and Chen, Wenyu and Fu, Yuzhuo and Fan, Zhiting and Gu, Yanmei and Wang, Yuanyuan and Yang, Zhengkai and Li, Jianguo and Zhao, Junbo},
|
129 |
+
journal = {arXiv preprint arXiv:2508.12790},
|
130 |
+
year = {2025}
|
131 |
+
}
|
132 |
+
```
|
133 |
+
```
|