-
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 4 -
PeterJinGo/SearchR1-nq_hotpotqa_train-llama3.2-3b-em-ppo
Updated -
PeterJinGo/SearchR1-nq_hotpotqa_train-llama3.2-3b-em-grpo
Updated • 2 -
PeterJinGo/SearchR1-nq_hotpotqa_train-llama3.2-3b-it-em-ppo
Updated • 1
Bowen
PeterJinGo
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
about 3 hours ago
Search-R1: Training LLMs to Reason and Leverage Search Engines with
Reinforcement Learning
updated
a collection
about 16 hours ago
Search-R1
updated
a dataset
about 16 hours ago
PeterJinGo/nq_hotpotqa_train
Organizations
Collections
1
models
11
PeterJinGo/SearchR1-nq_hotpotqa_train-llama3.2-3b-it-em-ppo
Updated
•
1
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-3b-it-em-grpo
Updated
•
1
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-7b-em-ppo
Updated
•
4
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-3b-em-ppo
Updated
•
1
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-7b-it-em-ppo
Updated
•
2
PeterJinGo/SearchR1-nq_hotpotqa_train-llama3.2-3b-it-em-grpo
Updated
•
1
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-3b-it-em-ppo
Updated
•
1
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-3b-em-grpo
Updated
PeterJinGo/SearchR1-nq_hotpotqa_train-llama3.2-3b-em-ppo
Updated
PeterJinGo/SearchR1-nq_hotpotqa_train-llama3.2-3b-em-grpo
Updated
•
2
datasets
11
PeterJinGo/nq_hotpotqa_train
Viewer
•
Updated
•
221k
•
6
PeterJinGo/wiki-18-e5-index
Updated
•
724
PeterJinGo/wiki-18-corpus
Updated
•
350
PeterJinGo/ultrafeedback_first_5000
Viewer
•
Updated
•
5k
•
13
PeterJinGo/gsm8k-chat
Viewer
•
Updated
•
7.47k
•
57
PeterJinGo/math-zeroshot-chat
Viewer
•
Updated
•
7.5k
•
56
PeterJinGo/math-zeroshot
Viewer
•
Updated
•
7.5k
•
55
PeterJinGo/math2
Viewer
•
Updated
•
7.5k
•
50
PeterJinGo/math
Viewer
•
Updated
•
7.5k
•
50
PeterJinGo/gsm8k
Viewer
•
Updated
•
7.47k
•
63