LLM-Hallucination-Detection-Leaderboard

Running

App Files Files Community

LLM-Hallucination-Detection-Leaderboard / introduction.md

rymc

add RAG experiments (#7)

9ad548c verified 4 months ago

preview code

raw

history blame

2.08 kB

	<!--
	keywords: LLM hallucination detection, hallucination leaderboard, RAG hallucination benchmark, UltraChat hallucination rate, Verify API, kluster.ai, factual accuracy of language models, large language model evaluation
	-->

	The LLM Hallucination Detection Leaderboard is a public, continuously updated comparison of how well popular Large Language Models (LLMs) avoid hallucinations, responses that are factually incorrect, fabricated, or unsupported by evidence. By surfacing transparent metrics across tasks, we help practitioners choose models that they can trust in production.

	### Why does hallucination detection matter?

	* User Trust & Safety: Hallucinations undermine confidence and can damage reputation.
	* Retrieval-Augmented Generation (RAG) Quality: In enterprise workflows, LLMs must remain faithful to supplied context. Measuring hallucinations highlights which models respect that constraint.
	* Regulatory & Compliance Pressure: Upcoming AI regulations require demonstrable accuracy standards. Reliable hallucination metrics can help you meet these requirements.

	### How we measure hallucinations

	We evaluate each model on two complementary benchmarks and compute a hallucination rate (lower = better):

	1. HaluEval-QA (RAG setting): Given a question and a supporting document, the model must answer only using the provided context.
	2. UltraChat Filtered (Non-RAG setting): Open-domain questions with no extra context test the model's internal knowledge.

	Outputs are automatically verified by [Verify](https://platform.kluster.ai/verify) from [kluster.ai](https://kluster.ai/), which cross-checks claims against the source document or web results.

	> Note: Full experiment details, including prompt templates, dataset description, and evaluation methodology, are provided at the end of this page for reference.

	---

	Stay informed as we add new models and tasks, and follow us on [X](https://x.com/klusterai) or join Discord [here](https://discord.com/invite/klusterai) for the latest updates on trustworthy LLMs.