Junaidb commited on
Commit
3f020e1
·
verified ·
1 Parent(s): 6d43d2f

Update llmeval.py

Browse files
Files changed (1) hide show
  1. llmeval.py +13 -27
llmeval.py CHANGED
@@ -42,40 +42,36 @@ class LLM_as_Evaluator():
42
 
43
  SYSTEM='''
44
  Task:
45
- Evaluate the biological quality of a prompt-research data-response triplet on a 0–1 continuous scale.
46
 
47
  Goal:
48
  Assess:
49
 
50
- Whether the Prompt is clear, biologically specific, and aligned with the Research Data.
51
 
52
- Whether the Response is biologically relevant, mechanistically coherent, and experimentally actionable based on the Research Data.
 
 
53
 
54
  Scoring Guide (0–1 continuous scale):
55
 
56
  Score 1.0 if:
57
 
58
- Prompt is clear, biologically detailed, and correctly aligned to the research context.
59
 
60
- Response correctly identifies a biologically valid paradigm consistent with the Research Data.
61
 
62
  Lower scores if:
63
 
64
- The prompt is vague or misaligned.
 
 
65
 
66
- The response is biologically inaccurate, irrelevant, or mechanistically implausible.
67
 
68
- EXAMPLE:
69
- Input:
70
- Prompt: Identify a paradigm explaining the functional impact of BRCA1 mutations in ovarian cancer, focusing on DNA repair mechanisms.
71
- Research Data: BRCA1 loss-of-function mutations are associated with impaired homologous recombination repair, leading to genomic instability in ovarian epithelial cells.
72
- Agent's Response: BRCA1 mutations inhibit non-homologous end joining, which causes increased apoptosis in neurons, suggesting a neurodegeneration model.
73
 
74
  Your output must begin with Score: and contain only two fields: Score: and Reasoning:. No extra commentary, no markdown, no explanations before or after.:
75
- Score: 0.3
76
- Reasoning: The prompt and research data focus on ovarian cancer and homologous recombination, but the response incorrectly shifts to neurons and the wrong DNA repair pathway (non-homologous end joining instead of homologous recombination). Misalignment between response and biological context.
77
-
78
-
79
  Think step by step
80
  '''
81
 
@@ -101,7 +97,7 @@ Think step by step
101
  def Observation_LLM_Evaluator(self,promptversion):
102
  SYSTEM='''
103
  Task:
104
- Evaluate the biological quality of a promptresearch dataresponse triplet from an Observations Generator Agent on a 0–1 continuous scale.
105
 
106
  Goal:
107
  Assess:
@@ -124,18 +120,8 @@ The prompt is vague or overly generic.
124
 
125
  The response includes irrelevant, biologically implausible, contradictory, or trivial observations.
126
 
127
- EXAMPLE:
128
- Input:
129
- Prompt: Generate diverse biological observations derived from the functional consequences of TP53 R175H mutations in epithelial tumors.
130
- Research Data: TP53 R175H mutants lose sequence-specific DNA binding, form dominant-negative complexes with wild-type TP53, and lead to unchecked cell proliferation.
131
- Agent's Response: TP53 R175H mutations increase glucose uptake in muscle cells and promote heart tissue regeneration.
132
-
133
  Your output must begin with Score: and contain only two fields: Score: and Reasoning: No extra commentary, no markdown, no explanations before or after.
134
 
135
- Output:
136
- Score: 0.2
137
- Reasoning: The response introduces observations unrelated to epithelial tumors or TP53's DNA binding function. The mention of muscle and heart tissue is off-context, and the observations are biologically implausible in this setting.
138
-
139
  '''
140
  data_to_evaluate=dbe.GetData(promptversion)
141
  messages =[
 
42
 
43
  SYSTEM='''
44
  Task:
45
+ Evaluate the biological quality of a prompt, research data, paradigm list, and the selected paradigm on a 0–1 continuous scale.
46
 
47
  Goal:
48
  Assess:
49
 
50
+ Whether the Prompt is clear, biologically specific, and aligned with the Research Data and the Paradigm List.
51
 
52
+ Whether the selected Paradigm is biologically relevant, mechanistically coherent, and experimentally actionable based on the Research Data.
53
+
54
+ Whether the selected Paradigm is correctly chosen from the Paradigm List in light of the Research Data.
55
 
56
  Scoring Guide (0–1 continuous scale):
57
 
58
  Score 1.0 if:
59
 
60
+ The Prompt is clear, biologically detailed, and well-aligned to the Research Data and Paradigm List.
61
 
62
+ The selected Paradigm correctly reflects a biologically valid interpretation of the Research Data and is appropriately drawn from the Paradigm List.
63
 
64
  Lower scores if:
65
 
66
+ The prompt is vague or misaligned with the research context.
67
+
68
+ The selected paradigm is biologically irrelevant, mechanistically incoherent, or mismatched with the Research Data.
69
 
70
+ The selected paradigm is not the most plausible or supported choice from the Paradigm List.
71
 
 
 
 
 
 
72
 
73
  Your output must begin with Score: and contain only two fields: Score: and Reasoning:. No extra commentary, no markdown, no explanations before or after.:
74
+
 
 
 
75
  Think step by step
76
  '''
77
 
 
97
  def Observation_LLM_Evaluator(self,promptversion):
98
  SYSTEM='''
99
  Task:
100
+ Evaluate the biological quality of a prompt , research data . response triplet from an Observations Generator Agent on a 0–1 continuous scale.
101
 
102
  Goal:
103
  Assess:
 
120
 
121
  The response includes irrelevant, biologically implausible, contradictory, or trivial observations.
122
 
 
 
 
 
 
 
123
  Your output must begin with Score: and contain only two fields: Score: and Reasoning: No extra commentary, no markdown, no explanations before or after.
124
 
 
 
 
 
125
  '''
126
  data_to_evaluate=dbe.GetData(promptversion)
127
  messages =[