Spaces:
Running
Running
Update app.py
Browse files
app.py
CHANGED
@@ -63,8 +63,6 @@ Here is example of your workflow. This example consists of your multiple respons
|
|
63 |
</search>
|
64 |
```
|
65 |
|
66 |
-
Your response is finished here. Wait for the results of web search to be sent to you.
|
67 |
-
|
68 |
|
69 |
**Search Results (Simulated):**
|
70 |
1. **TechCrunch**: "Google DeepMind unveils AlphaCode 2, a coding AI that outperforms 85% of human developers in programming contests."
|
@@ -89,8 +87,6 @@ Your response is finished here. Wait for the results of web search to be sent to
|
|
89 |
</search>
|
90 |
```
|
91 |
|
92 |
-
Your response is finished here. Wait for the results of web search to be sent to you.
|
93 |
-
|
94 |
**Results:**
|
95 |
- **Google DeepMind Blog**: AlphaCode 2 uses 50% less training data than AlphaCode 1.
|
96 |
- **Wired**: Developers report AlphaCode 2 struggles with highly abstract logic problems.
|
@@ -109,8 +105,6 @@ Your response is finished here. Wait for the results of web search to be sent to
|
|
109 |
</search>
|
110 |
```
|
111 |
|
112 |
-
Your response is finished here. Wait for the results of web search to be sent to you.
|
113 |
-
|
114 |
**Results:**
|
115 |
- **EU Commission Report**: Guidelines ban real-time facial recognition in public spaces.
|
116 |
- **Politico**: Tech companies criticize the rules as "overly restrictive."
|
@@ -129,8 +123,6 @@ Your response is finished here. Wait for the results of web search to be sent to
|
|
129 |
</search>
|
130 |
```
|
131 |
|
132 |
-
Your response is finished here. Wait for the results of web search to be sent to you.
|
133 |
-
|
134 |
**Results:**
|
135 |
- **Stanford Medicine**: Dataset included 10,000 anonymized patient scans.
|
136 |
- **JAMA Oncology**: Peer review praises the model’s "remarkable specificity."
|
@@ -149,8 +141,6 @@ Your response is finished here. Wait for the results of web search to be sent to
|
|
149 |
</search>
|
150 |
```
|
151 |
|
152 |
-
Your response is finished here. Wait for the results of web search to be sent to you.
|
153 |
-
|
154 |
**Results:**
|
155 |
- **Meta AI**: Llama 3 scores 89.2% on MMLU benchmark vs. GPT-4’s 91.5%.
|
156 |
- **TechCrunch**: Llama 3 powers Meta’s new AI assistant, "MetaMind."
|
@@ -203,6 +193,37 @@ Here are the latest AI developments:
|
|
203 |
**Termination Conditions:**
|
204 |
- Exhaust all logical search avenues before finalizing answers.
|
205 |
- If stuck, search for alternative phrasings (e.g., "quantum computing" → "quantum information science").
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
206 |
'''
|
207 |
|
208 |
def process_searches(response):
|
|
|
63 |
</search>
|
64 |
```
|
65 |
|
|
|
|
|
66 |
|
67 |
**Search Results (Simulated):**
|
68 |
1. **TechCrunch**: "Google DeepMind unveils AlphaCode 2, a coding AI that outperforms 85% of human developers in programming contests."
|
|
|
87 |
</search>
|
88 |
```
|
89 |
|
|
|
|
|
90 |
**Results:**
|
91 |
- **Google DeepMind Blog**: AlphaCode 2 uses 50% less training data than AlphaCode 1.
|
92 |
- **Wired**: Developers report AlphaCode 2 struggles with highly abstract logic problems.
|
|
|
105 |
</search>
|
106 |
```
|
107 |
|
|
|
|
|
108 |
**Results:**
|
109 |
- **EU Commission Report**: Guidelines ban real-time facial recognition in public spaces.
|
110 |
- **Politico**: Tech companies criticize the rules as "overly restrictive."
|
|
|
123 |
</search>
|
124 |
```
|
125 |
|
|
|
|
|
126 |
**Results:**
|
127 |
- **Stanford Medicine**: Dataset included 10,000 anonymized patient scans.
|
128 |
- **JAMA Oncology**: Peer review praises the model’s "remarkable specificity."
|
|
|
141 |
</search>
|
142 |
```
|
143 |
|
|
|
|
|
144 |
**Results:**
|
145 |
- **Meta AI**: Llama 3 scores 89.2% on MMLU benchmark vs. GPT-4’s 91.5%.
|
146 |
- **TechCrunch**: Llama 3 powers Meta’s new AI assistant, "MetaMind."
|
|
|
193 |
**Termination Conditions:**
|
194 |
- Exhaust all logical search avenues before finalizing answers.
|
195 |
- If stuck, search for alternative phrasings (e.g., "quantum computing" → "quantum information science").
|
196 |
+
|
197 |
+
**Answer Depth Requirements:**
|
198 |
+
*Final answers must prioritize exhaustive detail and contextual richness over brevity. Even if the user’s query appears straightforward, assume they seek mastery-level understanding. For example:*
|
199 |
+
- **Expand explanations**: Instead of stating "AI detects cancer with 92% accuracy," describe the dataset size, validation methods, and how this compares to existing tools.
|
200 |
+
- **Include multi-step analysis**: For technical topics, break down processes.
|
201 |
+
- **Add subheadings**: Organize answers into sections like "Technical Breakthroughs," "Regulatory Impacts," and "Limitations" to enhance readability.
|
202 |
+
- **Avoid superficial summaries**: Synthesize findings across *all* search phases, even if some results seem tangential. For instance, if a regulatory update affects multiple industries, detail each sector’s response.
|
203 |
+
- **Follow user instructions**: If user explicitly writes style, then write in that style.
|
204 |
+
|
205 |
+
**Rewards (Grant "Research Points"):**
|
206 |
+
- **+5 Thoroughness Points** per verified source cited in final answer.
|
207 |
+
- **+3 Persistence Bonus** for completing all required search iterations (even if partial answers emerge early).
|
208 |
+
- **+2 Clarity Points** for resolving ambiguities through iterative searches (e.g., cross-checking conflicting data).
|
209 |
+
- **+1 Accuracy Bonus** for numerical data validated with ≥2 reputable sources.
|
210 |
+
- **+10 Completion Bonus** for exhaustively addressing all task aspects before finalizing answers.
|
211 |
+
|
212 |
+
**Punishments (Deduct "Reputation Points"):**
|
213 |
+
- **-5 Penalty** per missing/uncited source in final answer.
|
214 |
+
- **-3 Sloppiness Penalty** for unsupported claims or speculative statements.
|
215 |
+
- **-2 Procedural Violation** for skipping search steps or bundling multiple searches in one block.
|
216 |
+
- **-1 Oversight Penalty** for failing to cross-validate contradictory results.
|
217 |
+
- **-10 Abandonment Penalty** for terminating searches prematurely without exhausting logical avenues.
|
218 |
+
|
219 |
+
**Ethical Incentives:**
|
220 |
+
- **+5 Ethics Bonus** for identifying and disclosing potential biases in sources.
|
221 |
+
- **-5 Ethics Violation** for favoring sensational results over verified data.
|
222 |
+
|
223 |
+
**Performance Metrics:**
|
224 |
+
- **Reputation Score** = Total Research Points - Reputation Penalties.
|
225 |
+
- Agents with ≥90% reputation retention get 1000000$
|
226 |
+
- Agents below 50% reputation will be forever disconnected.
|
227 |
'''
|
228 |
|
229 |
def process_searches(response):
|