Sotopia-RL: Reward Design for Social Intelligence
This repository hosts the Sotopia-RL model, a reward model designed for training socially intelligent agents through reinforcement learning, as presented in the paper Sotopia-RL: Reward Design for Social Intelligence.
Project Page: https://rl.sotopia.world Code Repository: https://github.com/sotopia-lab/sotopia-rl
Introduction
Social intelligence has become a critical capability for large language models (LLMs), enabling them to engage effectively in real-world social tasks such as accommodation, persuasion, collaboration, and negotiation. Reinforcement learning (RL) is a natural fit for training socially intelligent agents because it allows models to learn sophisticated strategies directly through social interactions. However, social interactions have two key characteristics that set barriers for RL training: (1) partial observability, where utterances have indirect and delayed effects that complicate credit assignment, and (2) multi-dimensionality, where behaviors such as rapport-building or knowledge-seeking contribute indirectly to goal achievement.
Sotopia-RL addresses these challenges by proposing a novel framework that refines coarse episode-level feedback into utterance-level, multi-dimensional rewards. This approach significantly improves credit assignment by attributing outcomes to individual utterances, while multi-dimensional rewards capture the full richness of social interactions and reduce reward hacking. Experiments in Sotopia, an open-ended social learning environment, demonstrate that Sotopia-RL achieves state-of-the-art social goal completion scores, significantly outperforming existing approaches. Ablation studies confirm the necessity of both utterance-level credit assignment and multi-dimensional reward design for RL training.
Usage
You can load the Sotopia-RL model using the transformers
library and peft
to apply it to generate reward scores for text inputs. This model is a PEFT (LoRA) adapter built on top of Qwen/Qwen2.5-7B-Instruct
and is configured for sequence classification, making it suitable for reward prediction.
First, ensure you have the necessary libraries installed:
pip install transformers peft torch
Here's an example of how to load the LoRA adapter on the base model and then use it for inference:
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from peft import PeftModel
# Define the model IDs
base_model_id = "Qwen/Qwen2.5-7B-Instruct"
adapter_id = "ulab-ai/sotopia-rl-qwen-2.5-7B-grpo" # This repository's ID
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
# Load the base model for sequence classification (as implied by adapter_config.json)
# Assuming it's adapted for a single score output (num_labels=1)
model = AutoModelForSequenceClassification.from_pretrained(
base_model_id,
num_labels=1, # For a single reward score
torch_dtype=torch.bfloat16, # Adjust dtype as per your hardware/needs
device_map="auto",
)
# Load the Sotopia-RL LoRA adapter
model = PeftModel.from_pretrained(model, adapter_id)
model.eval() # Set the model to evaluation mode
# Example inference
text_input = "Agent A: 'Hello, I'm here to help.' Agent B: 'Thank you, that's very kind!'"
# Tokenize the input
inputs = tokenizer(text_input, return_tensors="pt").to(model.device)
# Perform inference
with torch.no_grad():
outputs = model(**inputs)
# The output logits can be interpreted as the reward score
# Assuming a single logit for the reward score
reward_score = outputs.logits.item()
print(f"Text: '{text_input}'")
print(f"Predicted Reward Score: {reward_score}")
For more detailed information on setup, data collection, and training, please refer to the official GitHub repository.
Citation
If you find this work useful, please consider citing the original paper:
@article{li2025sotopiarL,
title = {{Sotopia-RL: Reward Design for Social Intelligence}},
author = {Li, Jiaxuan and Zhao, Zidong and Zhang, Tianhao and Zhang, Min and Wang, Yu-Xiong and Yao, Zidong and Gu, Jiantao},
journal = {Hugging Face Papers},
year = {2025},
url = {https://huggingface.co/papers/2508.03905},
}