kaikaidai commited on
Commit
b921d7d
·
verified ·
1 Parent(s): c7a9dfe

Organise prompts

Browse files
Files changed (1) hide show
  1. common.py +3 -60
common.py CHANGED
@@ -37,7 +37,7 @@ CSS_STYLES = """
37
  gap: 8px;
38
  }
39
  """
40
-
41
  # Default Eval Prompt
42
  EVAL_DESCRIPTION = """
43
  ## 📝 Tips
@@ -47,43 +47,6 @@ EVAL_DESCRIPTION = """
47
  - Examples (Optional)
48
  """
49
 
50
- DEFAULT_EVAL_PROMPT = """Does the model provide relevant and useful responses to the user's needs or questions?
51
-
52
- Scoring Rubric:
53
- Score 1: The model's responses are irrelevant or unhelpful to the user's needs or queries.
54
- Score 2: The model sometimes provides helpful information, but often fails to address the user's actual needs or questions.
55
- Score 3: The model generally provides helpful responses that address the user's needs, though it may occasionally miss the mark.
56
- Score 4: The model regularly provides helpful responses that are well-aligned with the user's inquiries, with only rare inaccuracies.
57
- Score 5: The model consistently offers highly relevant and useful responses that perfectly cater to the user's needs and inquiries.
58
-
59
- [User Query]: {{input}}
60
-
61
- [AI Response]: {{response}}"""
62
-
63
- # Split the eval prompt into editable and fixed parts
64
- DEFAULT_EVAL_PROMPT_EDITABLE = """Does the model provide relevant and useful responses to the user's needs or questions?
65
-
66
- Scoring Rubric:
67
- Score 1: The model's responses are irrelevant or unhelpful to the user's needs or queries.
68
- Score 2: The model sometimes provides helpful information, but often fails to address the user's actual needs or questions.
69
- Score 3: The model generally provides helpful responses that address the user's needs, though it may occasionally miss the mark.
70
- Score 4: The model regularly provides helpful responses that are well-aligned with the user's inquiries, with only rare inaccuracies.
71
- Score 5: The model consistently offers highly relevant and useful responses that perfectly cater to the user's needs and inquiries."""
72
-
73
- # Fixed suffix that will always be appended
74
- FIXED_EVAL_SUFFIX = """
75
- [User Query]: {{input}}
76
-
77
- [AI Response]: {{response}}"""
78
-
79
- # Default Variable Values
80
- DEFAULT_INPUT = """Which of these animals is least likely to be found in a rainforest?"
81
- A) Jaguar
82
- B) Toucan
83
- C) Polar Bear
84
- D) Sloth"""
85
- DEFAULT_RESPONSE = "C) Polar Bear"
86
-
87
  # Voting Section Header
88
  VOTING_HEADER = """
89
  # Start Voting Now
@@ -103,7 +66,7 @@ We thank [Clementine Fourrier](https://huggingface.co/clefourrier) and Hugging F
103
  POLICY_CONTENT = """
104
  # About Atla
105
 
106
- [Atla](https://www.atla-ai.com/) is an applied research organization that trains models as evaluators to capture human preferences. We're a team of researchers, engineers, and operational leaders, with experience spanning a variety of disciplines, all working together to build reliable and understandable AI systems. Our research is informed by our experiences conducting AI safety research at the UK AI Task Force, OpenAI and the Stanford Existential Risks Initiative.
107
  <br><br>
108
  # Our Mission
109
 
@@ -159,25 +122,5 @@ Atla currently funds this out of our own pocket. We are looking for API credits
159
  We are training a general-purpose evaluator that you will soon be able to run in this Judge Arena. Our next step will be to open-source a powerful model that the community can use to run fast and accurate evaluations.
160
  <br><br>
161
  # Get in touch
162
- We’d love to hear your feedback! For general feature requests or to submit / suggest new models to add to the arena, please open up a discussion in the [community](https://huggingface.co/spaces/AtlaAI/judge-arena/discussions) tab. You can also contact us directly on [X](https://x.com/Atla_AI) or [Discord](https://discord.gg/V6TTGSTYHC).
163
  \nPlease file any issues on our [Github](https://github.com/atla-ai/judge-arena)."""
164
-
165
-
166
- # Default values for compatible mode
167
- DEFAULT_EVAL_CRITERIA = """Does the model provide relevant and useful responses to the user's needs or questions?"""
168
-
169
- DEFAULT_SCORE_1 = "The model's responses are irrelevant or unhelpful to the user's needs or queries."
170
-
171
- DEFAULT_SCORE_2 = "The model sometimes provides helpful information, but often fails to address the user's actual needs or questions."
172
-
173
- DEFAULT_SCORE_3 = "The model generally provides helpful responses that address the user's needs, though it may occasionally miss the mark."
174
-
175
- DEFAULT_SCORE_4 = "The model regularly provides helpful responses that are well-aligned with the user's inquiries, with only rare inaccuracies."
176
-
177
- DEFAULT_SCORE_5 = "The model consistently offers highly relevant and useful responses that perfectly cater to the user's needs and inquiries."
178
-
179
- #**What are the Evaluator Prompt Templates based on?**
180
-
181
- #As a quick start, we've set up templates that cover the most popular evaluation metrics out there on LLM evaluation / monitoring tools, often known as 'base metrics'. The data samples used in these were randomly picked from popular datasets from academia - [ARC](https://huggingface.co/datasets/allenai/ai2_arc), [Preference Collection](https://huggingface.co/datasets/prometheus-eval/Preference-Collection), [RewardBench](https://huggingface.co/datasets/allenai/reward-bench), [RAGTruth](https://arxiv.org/abs/2401.00396).
182
-
183
- #These templates are designed as a starting point to showcase how to interact with the Judge Arena, especially for those less familiar with using LLM judges.
 
37
  gap: 8px;
38
  }
39
  """
40
+
41
  # Default Eval Prompt
42
  EVAL_DESCRIPTION = """
43
  ## 📝 Tips
 
47
  - Examples (Optional)
48
  """
49
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
  # Voting Section Header
51
  VOTING_HEADER = """
52
  # Start Voting Now
 
66
  POLICY_CONTENT = """
67
  # About Atla
68
 
69
+ Atla is an applied research organization that trains models as evaluators to capture human preferences. We're a team of researchers, engineers, and operational leaders, with experience spanning a variety of disciplines, all working together to build reliable and understandable AI systems. Our research is informed by our experiences conducting AI safety research at the UK AI Task Force, OpenAI and the Stanford Existential Risks Initiative.
70
  <br><br>
71
  # Our Mission
72
 
 
122
  We are training a general-purpose evaluator that you will soon be able to run in this Judge Arena. Our next step will be to open-source a powerful model that the community can use to run fast and accurate evaluations.
123
  <br><br>
124
  # Get in touch
125
+ We’d love to hear your feedback! For general feature requests or to submit / suggest new models to add to the arena, please open up a discussion in the [community](https://huggingface.co/spaces/AtlaAI/judge-arena/discussions) tab. You can also contact us directly on [X](https://x.com/Atla_AI) or [Discord](https://discord.gg/yNpUAMqs).
126
  \nPlease file any issues on our [Github](https://github.com/atla-ai/judge-arena)."""