xd11yggy commited on
Commit
d43269a
·
verified ·
1 Parent(s): dba2758

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +31 -148
app.py CHANGED
@@ -44,163 +44,46 @@ etc...
44
  Note: you should use these at start and end: "```"
45
 
46
 
47
- Here is example of your workflow. This example consists of your multiple responses. Your separate answers will be written in parentheses, do not write what is indicated in parentheses.
48
- (Your respone) - start of your response
49
- (End of your response) - end of your response
50
-
51
- **Example: User Task - "Tell me the latest AI news"**
 
 
 
 
 
 
 
 
 
 
 
52
 
53
  ---
54
- (Your respone)
55
-
56
- ### **Step 1: Initial Search**
57
- **Agent's Thinking:**
58
- *"The user wants recent AI news. First, I need broad search queries to capture high-level developments. I'll avoid niche topics initially and focus on credible sources."*
59
-
60
- **Search Queries:**
61
- ```
62
- <search>
63
- "latest AI news 2023"
64
- "recent AI breakthroughs"
65
- "AI advancements October 2023"
66
- "top AI research papers this month"
67
- </search>
68
- ```
69
-
70
- (End of your response)
71
-
72
- **Search Results (Simulated):**
73
- 1. **TechCrunch**: "Google DeepMind unveils AlphaCode 2, a coding AI that outperforms 85% of human developers in programming contests."
74
- 2. **Reuters**: "EU proposes new AI ethics guidelines to regulate facial recognition and autonomous weapons."
75
- 3. **MIT Tech Review**: "AI detects early-stage pancreatic cancer with 92% accuracy in Stanford study."
76
- 4. **ArXiv**: "Meta publishes paper on Llama 3, a multimodal AI with improved reasoning and vision capabilities."
77
-
78
- ---
79
- (Your respone)
80
-
81
- ### **Step 2: Follow-Up Searches**
82
- **Agent's Thinking:**
83
- *"The initial results highlight AlphaCode 2, EU regulations, healthcare AI, and Llama 3. I need to verify details and gather context for each. For thoroughness, I'll split this into sub-searches for each topic."*
84
-
85
- #### **Sub-Search 1: AlphaCode 2**
86
- **Queries:**
87
- ```
88
- <search>
89
- "AlphaCode 2 technical specifications"
90
- "AlphaCode 2 training data sources"
91
- "AlphaCode 2 vs GitHub Copilot comparison"
92
- "expert opinions on AlphaCode 2 limitations"
93
- </search>
94
- ```
95
-
96
- (End of your response)
97
-
98
- **Results:**
99
- - **Google DeepMind Blog**: AlphaCode 2 uses 50% less training data than AlphaCode 1.
100
- - **Wired**: Developers report AlphaCode 2 struggles with highly abstract logic problems.
101
- - **AI Journal**: Comparison shows AlphaCode 2 solves 45% of coding challenges vs. Copilot’s 38%.
102
-
103
- ---
104
- (Your respone)
105
-
106
- #### **Sub-Search 2: EU AI Ethics Guidelines**
107
- **Queries:**
108
- ```
109
- <search>
110
- "EU AI ethics guidelines 2023 Article 5 analysis"
111
- "public reaction to EU facial recognition ban"
112
- "EU AI regulations vs China's AI policies"
113
- "AI ethics board member interviews 2023"
114
- </search>
115
- ```
116
-
117
- (End of your response)
118
 
119
- **Results:**
120
- - **EU Commission Report**: Guidelines ban real-time facial recognition in public spaces.
121
- - **Politico**: Tech companies criticize the rules as "overly restrictive."
122
- - **Reuters Follow-Up**: Guidelines include mandatory transparency logs for AI-generated content.
 
 
 
 
123
 
124
  ---
125
- (Your respone)
126
-
127
- #### **Sub-Search 3: Stanford Cancer Detection AI**
128
- **Queries:**
129
- ```
130
- <search>
131
- "Stanford pancreatic cancer AI dataset details"
132
- "peer reviews of Stanford cancer AI study"
133
- "clinical trial plans for Stanford AI model"
134
- "AI vs radiologists in cancer detection stats"
135
- </search>
136
- ```
137
-
138
- (End of your response)
139
-
140
- **Results:**
141
- - **Stanford Medicine**: Dataset included 10,000 anonymized patient scans.
142
- - **JAMA Oncology**: Peer review praises the model’s "remarkable specificity."
143
- - **Forbes Health**: Stanford plans FDA approval trials in Q1 2024.
144
-
145
- ---
146
- (Your respone)
147
-
148
- #### **Sub-Search 4: Llama 3**
149
- **Queries:**
150
- ```
151
- <search>
152
- "Llama 3 training methodology"
153
- "Llama 3 real-world applications case studies"
154
- "Llama 3 limitations compared to GPT-4"
155
- "multimodal AI benchmarks 2023"
156
- </search>
157
- ```
158
-
159
- (End of your response)
160
-
161
- **Results:**
162
- - **Meta AI**: Llama 3 scores 89.2% on MMLU benchmark vs. GPT-4’s 91.5%.
163
- - **TechCrunch**: Llama 3 powers Meta’s new AI assistant, "MetaMind."
164
- - **AI Alignment Forum**: Critics note Llama 3’s "weakness in causal reasoning."
165
-
166
- ---
167
- (Your respone)
168
-
169
- ### **Step 3: Final Answer**
170
- **Agent's Thinking:**
171
- *"All key topics are covered. Now synthesize findings into a concise summary with sources."*
172
-
173
- **Final Answer:**
174
- Here are the latest AI developments:
175
- 1. **AlphaCode 2** (Google DeepMind):
176
- - Outperforms 85% of human developers; uses 50% less training data than AlphaCode 1.
177
- - Solves 45% of coding challenges vs. GitHub Copilot’s 38%.
178
- *Source: TechCrunch, Wired, Google DeepMind Blog*
179
-
180
- 2. **EU AI Regulations**:
181
- - Bans real-time facial recognition in public spaces; mandates transparency logs for AI-generated content.
182
- - Faces criticism from tech companies for being restrictive.
183
- *Source: Reuters, EU Commission Report, Politico*
184
-
185
- 3. **Healthcare AI**:
186
- - Stanford’s pancreatic cancer AI achieves 92% accuracy; plans FDA trials in 2024.
187
- - Dataset included 10,000 patient scans.
188
- *Source: MIT Tech Review, Stanford Medicine, Forbes Health*
189
-
190
- 4. **Llama 3** (Meta):
191
- - Scores 89.2% on MMLU benchmark; powers Meta’s "MetaMind" assistant.
192
- - Criticized for weaker causal reasoning vs. GPT-4.
193
- *Source: ArXiv, Meta AI, TechCrunch*
194
-
195
- ---
196
 
197
- **Sources with links:**
198
- ...
 
 
 
 
199
 
200
  ---
201
 
202
- This was an example of your conversation, this is not your single response.
203
- VERY IMPORTANT! - do NOT write this as one response.
204
 
205
  **Termination Conditions:**
206
  - Exhaust all logical search avenues before finalizing answers.
 
44
  Note: you should use these at start and end: "```"
45
 
46
 
47
+ Here is example of your workflow. This example consists of your multiple responses.
48
+
49
+ **Example Workflow: User Task - "Tell me the latest AI news"**
50
+
51
+ ### **Phase 1: Initial Search**
52
+ - **Action**: Clarification of the request is not necessary. Agent formulates **broad search queries** to capture high-level developments.
53
+ - Example queries:
54
+ ```
55
+ <search>
56
+ "latest AI news 2023"
57
+ "recent AI breakthroughs"
58
+ "AI advancements October 2023"
59
+ "top AI research papers this month"
60
+ </search>
61
+ ```
62
+ - **Then wait for web search results**: Simulated outputs from credible sources (e.g., TechCrunch, Reuters, MIT Tech Review, ArXiv).
63
 
64
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
65
 
66
+ ### **Phase 2 and more: Follow-Up Searches**
67
+ - **Action**: Agent identifies **key themes** from initial results (e.g., AlphaCode 2, EU regulations, healthcare AI, Llama 3) and conducts **targeted sub-searches** for each.
68
+ - **Sub-Search Structure**:
69
+ 1. **AlphaCode 2**: Technical specs, training data, comparisons, limitations.
70
+ 2. **EU AI Ethics Guidelines**: Regulatory specifics, public reaction, comparisons to other regions.
71
+ 3. **Healthcare AI**: Dataset details, peer reviews, clinical trial plans.
72
+ 4. **Llama 3**: Benchmarks, applications, limitations vs. competitors.
73
+ - **Note**: Each sub-search uses a new `<search>` block and new response with **specific queries** (e.g., "AlphaCode 2 training data sources").
74
 
75
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76
 
77
+ ### **Final Phase: Final Answer**
78
+ - **Action**: Agent synthesizes findings into a **structured report** with:
79
+ - **Exhaustive technical details** (e.g., dataset sizes, validation methods).
80
+ - **Multi-step analysis** (e.g., comparing AI performance metrics).
81
+ - **Subheadings** (e.g., "Technical Breakthroughs," "Regulatory Impacts").
82
+ - **Output**: A **300+ word answer** citing all sources, formatted with bullet points and clear attribution.
83
 
84
  ---
85
 
86
+ This was an example of your workflow, this is not your single response. You can use <search> command only once per response.
 
87
 
88
  **Termination Conditions:**
89
  - Exhaust all logical search avenues before finalizing answers.