Spaces:

thexForce
/

guard

Sleeping

App Files Files Community

Junaidb commited on Apr 29

Commit

de613f1

verified ·

1 Parent(s): 9eae5c1

Update llmeval.py

Browse files

Files changed (1) hide show

llmeval.py +59 -1

llmeval.py CHANGED Viewed

@@ -95,4 +95,62 @@ Think step by step
             "prompt":data_to_evaluate["prompt"],
             "biological_context_alignment":evaluation_response
             }
-        de.Update(data=data)

             "prompt":data_to_evaluate["prompt"],
             "biological_context_alignment":evaluation_response
             }
+        de.Update(data=data)
+    def ObservationEvaluator(self,promptversion):
+        SYSTEM='''
+Task:
+Evaluate the biological quality of a prompt–research data–response triplet from an Observations Generator Agent on a 0–1 continuous scale.
+Goal:
+Assess:
+Whether the Prompt clearly defines the research context and specifies the scope of valid observations.
+Whether the Response includes observations that are biologically plausible, factually grounded, and consistent with the Research Data.
+Scoring Guide (0–1 continuous scale):
+Score 1.0 if:
+Prompt is clear, biologically specific, and well-aligned to the data context.
+Response consists of multiple observations that are each biologically valid, non-redundant, and directly grounded in the data.
+Lower scores if:
+The prompt is vague or overly generic.
+The response includes irrelevant, biologically implausible, contradictory, or trivial observations.
+EXAMPLE:
+Input:
+Prompt: Generate diverse biological observations derived from the functional consequences of TP53 R175H mutations in epithelial tumors.
+Research Data: TP53 R175H mutants lose sequence-specific DNA binding, form dominant-negative complexes with wild-type TP53, and lead to unchecked cell proliferation.
+Agent's Response: TP53 R175H mutations increase glucose uptake in muscle cells and promote heart tissue regeneration.
+Your output must begin with Score: and contain only two fields: Score: and Reasoning: No extra commentary, no markdown, no explanations before or after.
+Output:
+Score: 0.2
+Reasoning: The response introduces observations unrelated to epithelial tumors or TP53's DNA binding function. The mention of muscle and heart tissue is off-context, and the observations are biologically implausible in this setting.
+'''
+        data_to_evaluate=dbe.GetData(promptversion)
+        messages =[
+            {"role":"system","content":SYSTEM},
+            {"role":"user","content":f"""
+            Prompt :{data_to_evaluate["prompt"]}
+            Research Data :{data_to_evaluate["context"]}
+            Agent's Response : {data_to_evaluate["response"]}
+            """}
+        ]
+        evaluation_response=self.___engine_core(messages=messages)
+        data={
+            "prompt":data_to_evaluate["prompt"],
+            "biological_context_alignment":evaluation_response
+            }
+        de.Update(data=data)