Sotopia-RL: Reward Design for Social Intelligence

This repository hosts the Sotopia-RL model, a reward model designed for training socially intelligent agents through reinforcement learning, as presented in the paper Sotopia-RL: Reward Design for Social Intelligence.

Project Page: https://rl.sotopia.world Code Repository: https://github.com/sotopia-lab/sotopia-rl

Introduction

Social intelligence has become a critical capability for large language models (LLMs), enabling them to engage effectively in real-world social tasks such as accommodation, persuasion, collaboration, and negotiation. Reinforcement learning (RL) is a natural fit for training socially intelligent agents because it allows models to learn sophisticated strategies directly through social interactions. However, social interactions have two key characteristics that set barriers for RL training: (1) partial observability, where utterances have indirect and delayed effects that complicate credit assignment, and (2) multi-dimensionality, where behaviors such as rapport-building or knowledge-seeking contribute indirectly to goal achievement.

Sotopia-RL addresses these challenges by proposing a novel framework that refines coarse episode-level feedback into utterance-level, multi-dimensional rewards. This approach significantly improves credit assignment by attributing outcomes to individual utterances, while multi-dimensional rewards capture the full richness of social interactions and reduce reward hacking. Experiments in Sotopia, an open-ended social learning environment, demonstrate that Sotopia-RL achieves state-of-the-art social goal completion scores, significantly outperforming existing approaches. Ablation studies confirm the necessity of both utterance-level credit assignment and multi-dimensional reward design for RL training.

Usage

You can load the Sotopia-RL model using the transformers library and peft to apply it to generate reward scores for text inputs. This model is a PEFT (LoRA) adapter built on top of Qwen/Qwen2.5-7B-Instruct and is configured for sequence classification, making it suitable for reward prediction.

First, ensure you have the necessary libraries installed:

pip install transformers peft torch

Here's an example of how to load the LoRA adapter on the base model and then use it for inference:

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from peft import PeftModel

# Define the model IDs
base_model_id = "Qwen/Qwen2.5-7B-Instruct"
adapter_id = "ulab-ai/sotopia-rl-qwen-2.5-7B-grpo" # This repository's ID

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_id)

# Load the base model for sequence classification (as implied by adapter_config.json)
# Assuming it's adapted for a single score output (num_labels=1)
model = AutoModelForSequenceClassification.from_pretrained(
    base_model_id,
    num_labels=1, # For a single reward score
    torch_dtype=torch.bfloat16, # Adjust dtype as per your hardware/needs
    device_map="auto",
)

# Load the Sotopia-RL LoRA adapter
model = PeftModel.from_pretrained(model, adapter_id)
model.eval() # Set the model to evaluation mode

# Example inference
text_input = "Agent A: 'Hello, I'm here to help.' Agent B: 'Thank you, that's very kind!'"

# Tokenize the input
inputs = tokenizer(text_input, return_tensors="pt").to(model.device)

# Perform inference
with torch.no_grad():
    outputs = model(**inputs)

# The output logits can be interpreted as the reward score
# Assuming a single logit for the reward score
reward_score = outputs.logits.item()

print(f"Text: '{text_input}'")
print(f"Predicted Reward Score: {reward_score}")

For more detailed information on setup, data collection, and training, please refer to the official GitHub repository.

Citation

If you find this work useful, please consider citing the original paper:

@article{li2025sotopiarL,
  title = {{Sotopia-RL: Reward Design for Social Intelligence}},
  author = {Li, Jiaxuan and Zhao, Zidong and Zhang, Tianhao and Zhang, Min and Wang, Yu-Xiong and Yao, Zidong and Gu, Jiantao},
  journal = {Hugging Face Papers},
  year = {2025},
  url = {https://huggingface.co/papers/2508.03905},
}

ulab-ai
/

sotopia-rl-qwen2.5-7B-rm

Sotopia-RL: Reward Design for Social Intelligence

Introduction

Usage

Citation

Model tree for ulab-ai/sotopia-rl-qwen2.5-7B-rm

Collection including ulab-ai/sotopia-rl-qwen2.5-7B-rm

Sotopia-RL