KnowRL

Exploring Knowledgeable Reinforcement Learning for Factuality

  📄arXiv •   💻GitHub Repo •   📖Dataset


Model Description

KnowRL-DeepSeek-R1-Distill-Qwen-7B is a slow-thinking language model that results from applying our KnowRL framework to the base model DeepSeek-R1-Distill-Qwen-7B.

The KnowRL (Knowledgeable Reinforcement Learning) framework is designed to mitigate hallucinations in Large Language Models (LLMs) by integrating external knowledge directly into the training process. The model is trained using Knowledgeable Reinforcement Learning (RL), where a reward signal explicitly encourages factual accuracy in its reasoning process, helping it learn its own knowledge boundaries.

As a result, this model demonstrates a significant reduction in hallucinations on factual benchmarks while preserving or even enhancing the strong reasoning capabilities inherited from its base model.

How to Use

Using the transformers Library

You can use this model with the transformers library for text generation tasks. It is important to follow the specific prompt format, which includes <think> and <answer> tags, to get the best results.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Set the device
device = "cuda" if torch.cuda.is_available() else "cpu"

# Load the model and tokenizer
model_name = "zjunlp/KnowRL-DeepSeek-R1-Distill-Qwen-7B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16).to(device)

# Define the prompt using the model's template
prompt = "What is the main function of the mitochondria?"
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# Generate a response
inputs = tokenizer(text, return_tensors="pt").to(device)
outputs = model.generate(**inputs, max_new_tokens=512)

# Decode and print the output
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Using huggingface-cli

You can also download the model from the command line using huggingface-cli.

huggingface-cli download zjunlp/KnowRL-DeepSeek-R1-Distill-Qwen-7B --local-dir KnowRL-DeepSeek-R1-Distill-Qwen-7B

Training Details

The model is trained using Knowledgeable Reinforcement Learning (RL) (specifically GRPO) using data from the zjunlp/KnowRL-Train-Data.

For complete details on the training configuration and hyperparameters, please refer to our GitHub repository.


Citation

If you find this model useful in your research, please consider citing our paper:

@article{ren2025knowrl,
  title={KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality},
  author={Ren, Baochang and Qiao, Shuofei and Yu, Wenhao and Chen, Huajun and Zhang, Ningyu},
  journal={arXiv preprint arXiv:2506.19807},
  year={2025}
}
Downloads last month
21
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including zjunlp/KnowRL-DeepSeek-R1-Distill-Qwen-7B