KnowRL
Exploring Knowledgeable Reinforcement Learning for Factuality
Model Description
KnowRL-DeepSeek-R1-Distill-Qwen-7B is a slow-thinking language model that results from applying our KnowRL framework to the base model DeepSeek-R1-Distill-Qwen-7B.
The KnowRL (Knowledgeable Reinforcement Learning) framework is designed to mitigate hallucinations in Large Language Models (LLMs) by integrating external knowledge directly into the training process. The model is trained using Knowledgeable Reinforcement Learning (RL), where a reward signal explicitly encourages factual accuracy in its reasoning process, helping it learn its own knowledge boundaries.
As a result, this model demonstrates a significant reduction in hallucinations on factual benchmarks while preserving or even enhancing the strong reasoning capabilities inherited from its base model.
How to Use
Using the transformers Library
You can use this model with the transformers library for text generation tasks. It is important to follow the specific prompt format, which includes <think> and <answer> tags, to get the best results.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
# Set the device
device = "cuda" if torch.cuda.is_available() else "cpu"
# Load the model and tokenizer
model_name = "zjunlp/KnowRL-DeepSeek-R1-Distill-Qwen-7B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16).to(device)
# Define the prompt using the model's template
prompt = "What is the main function of the mitochondria?"
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# Generate a response
inputs = tokenizer(text, return_tensors="pt").to(device)
outputs = model.generate(**inputs, max_new_tokens=512)
# Decode and print the output
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Using huggingface-cli
You can also download the model from the command line using huggingface-cli.
huggingface-cli download zjunlp/KnowRL-DeepSeek-R1-Distill-Qwen-7B --local-dir KnowRL-DeepSeek-R1-Distill-Qwen-7B
Training Details
The model is trained using Knowledgeable Reinforcement Learning (RL) (specifically GRPO) using data from the zjunlp/KnowRL-Train-Data.
For complete details on the training configuration and hyperparameters, please refer to our GitHub repository.
Citation
If you find this model useful in your research, please consider citing our paper:
@article{ren2025knowrl,
title={KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality},
author={Ren, Baochang and Qiao, Shuofei and Yu, Wenhao and Chen, Huajun and Zhang, Ningyu},
journal={arXiv preprint arXiv:2506.19807},
year={2025}
}
- Downloads last month
- 21