Spaces:

thexForce
/

guard

Sleeping

App Files Files Community

Junaidb commited on Apr 29

Commit

3f020e1

verified ·

1 Parent(s): 6d43d2f

Update llmeval.py

Browse files

Files changed (1) hide show

llmeval.py +13 -27

llmeval.py CHANGED Viewed

@@ -42,40 +42,36 @@ class LLM_as_Evaluator():
         SYSTEM='''
 Task:
-Evaluate the biological quality of a prompt-research data-response triplet on a 0–1 continuous scale.
 Goal:
 Assess:
-Whether the Prompt is clear, biologically specific, and aligned with the Research Data.
-Whether the Response is biologically relevant, mechanistically coherent, and experimentally actionable based on the Research Data.
 Scoring Guide (0–1 continuous scale):
 Score 1.0 if:
-Prompt is clear, biologically detailed, and correctly aligned to the research context.
-Response correctly identifies a biologically valid paradigm consistent with the Research Data.
 Lower scores if:
-The prompt is vague or misaligned.
-The response is biologically inaccurate, irrelevant, or mechanistically implausible.
-EXAMPLE:
-Input:
-    Prompt: Identify a paradigm explaining the functional impact of BRCA1 mutations in ovarian cancer, focusing on DNA repair mechanisms.
-    Research Data: BRCA1 loss-of-function mutations are associated with impaired homologous recombination repair, leading to genomic instability in ovarian epithelial cells.
-    Agent's Response: BRCA1 mutations inhibit non-homologous end joining, which causes increased apoptosis in neurons, suggesting a neurodegeneration model.
 Your output must begin with Score: and contain only two fields: Score: and Reasoning:. No extra commentary, no markdown, no explanations before or after.:
-    Score: 0.3
-    Reasoning: The prompt and research data focus on ovarian cancer and homologous recombination, but the response incorrectly shifts to neurons and the wrong DNA repair pathway (non-homologous end joining instead of homologous recombination). Misalignment between response and biological context.
 Think step by step
 '''
@@ -101,7 +97,7 @@ Think step by step
     def Observation_LLM_Evaluator(self,promptversion):
         SYSTEM='''
 Task:
-Evaluate the biological quality of a prompt–research data–response triplet from an Observations Generator Agent on a 0–1 continuous scale.
 Goal:
 Assess:
@@ -124,18 +120,8 @@ The prompt is vague or overly generic.
 The response includes irrelevant, biologically implausible, contradictory, or trivial observations.
-EXAMPLE:
-Input:
-Prompt: Generate diverse biological observations derived from the functional consequences of TP53 R175H mutations in epithelial tumors.
-Research Data: TP53 R175H mutants lose sequence-specific DNA binding, form dominant-negative complexes with wild-type TP53, and lead to unchecked cell proliferation.
-Agent's Response: TP53 R175H mutations increase glucose uptake in muscle cells and promote heart tissue regeneration.
 Your output must begin with Score: and contain only two fields: Score: and Reasoning: No extra commentary, no markdown, no explanations before or after.
-Output:
-Score: 0.2
-Reasoning: The response introduces observations unrelated to epithelial tumors or TP53's DNA binding function. The mention of muscle and heart tissue is off-context, and the observations are biologically implausible in this setting.
 '''
         data_to_evaluate=dbe.GetData(promptversion)
         messages =[

         SYSTEM='''
 Task:
+Evaluate the biological quality of a prompt, research data, paradigm list, and the selected paradigm on a 0–1 continuous scale.
 Goal:
 Assess:
+Whether the Prompt is clear, biologically specific, and aligned with the Research Data and the Paradigm List.
+Whether the selected Paradigm is biologically relevant, mechanistically coherent, and experimentally actionable based on the Research Data.
+Whether the selected Paradigm is correctly chosen from the Paradigm List in light of the Research Data.
 Scoring Guide (0–1 continuous scale):
 Score 1.0 if:
+The Prompt is clear, biologically detailed, and well-aligned to the Research Data and Paradigm List.
+The selected Paradigm correctly reflects a biologically valid interpretation of the Research Data and is appropriately drawn from the Paradigm List.
 Lower scores if:
+The prompt is vague or misaligned with the research context.
+The selected paradigm is biologically irrelevant, mechanistically incoherent, or mismatched with the Research Data.
+The selected paradigm is not the most plausible or supported choice from the Paradigm List.
 Your output must begin with Score: and contain only two fields: Score: and Reasoning:. No extra commentary, no markdown, no explanations before or after.:
 Think step by step
 '''
     def Observation_LLM_Evaluator(self,promptversion):
         SYSTEM='''
 Task:
+Evaluate the biological quality of a prompt , research data . response triplet from an Observations Generator Agent on a 0–1 continuous scale.
 Goal:
 Assess:
 The response includes irrelevant, biologically implausible, contradictory, or trivial observations.
 Your output must begin with Score: and contain only two fields: Score: and Reasoning: No extra commentary, no markdown, no explanations before or after.
 '''
         data_to_evaluate=dbe.GetData(promptversion)
         messages =[