openfree commited on
Commit
e3fbcd7
·
1 Parent(s): 5e39079

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -11,7 +11,9 @@ short_description: Reasoning + Deep Research + API(NVIDIA H100 GPU)
11
  models:
12
  - VIDraft/Gemma-3-R1984-27B
13
  ---
 
14
  FACTS Grounding Leaderboard - Medical AI Evaluation
 
15
  🏥 Overview
16
  FACTS Grounding is an AI reliability evaluation system developed by Google DeepMind that verifies whether AI responses are grounded solely in provided documents. This evaluation is particularly crucial in healthcare, where inaccurate information can be life-threatening.
17
  🎯 Key Features
@@ -25,14 +27,13 @@ Dual-Criteria Assessment
25
  ✅ Grounding Check: Are all responses based on the provided documents?
26
 
27
 
28
-
29
  Medical-Focused Version
30
 
31
  236 medical cases selected from 860 total problems
32
  Strict evaluation criteria reflecting healthcare field requirements
33
 
34
  🏆 Current Leaderboard Rankings (June 5, 2025)
35
- Overall Score TOP 5
36
 
37
  1. deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
38
  2. VIDraft/Gemma-3-R1984-27B
 
11
  models:
12
  - VIDraft/Gemma-3-R1984-27B
13
  ---
14
+
15
  FACTS Grounding Leaderboard - Medical AI Evaluation
16
+
17
  🏥 Overview
18
  FACTS Grounding is an AI reliability evaluation system developed by Google DeepMind that verifies whether AI responses are grounded solely in provided documents. This evaluation is particularly crucial in healthcare, where inaccurate information can be life-threatening.
19
  🎯 Key Features
 
27
  ✅ Grounding Check: Are all responses based on the provided documents?
28
 
29
 
 
30
  Medical-Focused Version
31
 
32
  236 medical cases selected from 860 total problems
33
  Strict evaluation criteria reflecting healthcare field requirements
34
 
35
  🏆 Current Leaderboard Rankings (June 5, 2025)
36
+ Overall Score TOP 5(https://huggingface.co/spaces/MaziyarPanahi/FACTS-Leaderboard)
37
 
38
  1. deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
39
  2. VIDraft/Gemma-3-R1984-27B