Spaces:

thexForce
/

guard

Sleeping

App Files Files Community

Junaidb commited on Apr 29

Commit

6d43d2f

verified ·

1 Parent(s): 912d9b0

Update llmeval.py

Browse files

Files changed (1) hide show

llmeval.py +56 -1

llmeval.py CHANGED Viewed

@@ -98,7 +98,7 @@ Think step by step
         de.Update(data=data)
-    def ObservationEvaluator(self,promptversion):
         SYSTEM='''
 Task:
 Evaluate the biological quality of a prompt–research data–response triplet from an Observations Generator Agent on a 0–1 continuous scale.
@@ -153,4 +153,59 @@ Reasoning: The response introduces observations unrelated to epithelial tumors o
             "biological_context_alignment":evaluation_response
             }
         de.Update(data=data)

         de.Update(data=data)
+    def Observation_LLM_Evaluator(self,promptversion):
         SYSTEM='''
 Task:
 Evaluate the biological quality of a prompt–research data–response triplet from an Observations Generator Agent on a 0–1 continuous scale.
             "biological_context_alignment":evaluation_response
             }
         de.Update(data=data)
+    def  Anomaly_LLM_Evaluator(self,promptversion):
+        SYSTEM='''
+Task:
+Evaluate the biological quality of a prompt–observations–response triplet from an Anomaly Detector Agent on a 0–1 continuous scale.
+Goal:
+Assess:
+Whether the Prompt clearly defines the biological context and intent.
+Whether the Observations are biologically plausible and internally consistent.
+Whether the Response correctly identifies biologically relevant inconsistencies between the Paradigm and Observations.
+Scoring Guide (0–1 continuous scale):
+Score 1.0 if:
+The prompt is clear and biologically grounded.
+The response lists true, biologically meaningful anomalies based on the observations.
+All major contradictions or gaps are captured.
+Lower scores if:
+The prompt is vague.
+The response misses key anomalies, adds irrelevant ones, or shows poor biological reasoning.
+Your output must begin with Score: and contain only two fields: Score: and Reasoning: No extra commentary, no markdown, no explanations before or after.
+Output:
+Score: 0.2
+Reasoning: Your reasoning.
+'''
+        data_to_evaluate=dbe.GetData(promptversion)
+        messages=[
+            {"role":"system","content":SYSTEM},
+            {"role":"user","content":f"""
+            Prompt :{data_to_evaluate["prompt"]}
+            Observations :{ data_to_evaluate["context"]}
+            Agent's Response :{data_to_evaluate["response"]}
+            """}
+        ]
+        evaluation_response=self.___engine_core(messages=messages)
+        data={
+            "prompt":promptversion,
+            "biological_context_alignment":evaluation_response
+            }
+        de.Update(data=data)