Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
Edit Models filters
Tasks
Libraries
Datasets
Languages
Licenses
Other
1
Inference Providers
Select all
Together AI
SambaNova
Cerebras
Nebius AI Studio
Fireworks
Hyperbolic
Novita
fal
Replicate
HF Inference API
Misc
Reset Misc
arxiv:
2402.03300
Inference Endpoints
text-generation-inference
AutoTrain Compatible
4-bit precision
Carbon Emissions
Eval Results
8-bit precision
custom_code
Misc with no match
Merge
text-embeddings-inference
Mixture of Experts
Apply filters
Models
1,271
Full-text search
Edit filters
Sort: Trending
Active filters:
2402.03300
Clear all
byteXWJ/Qwen2.5-1.5B-Open-R1-GRPO
Text Generation
•
Updated
Feb 10
•
10
lukaspetersson/textworld-agent-0.5B-3-eureka
Updated
Feb 9
ununtrium/Qwen2.5-1.5B-Instruct-Open-R1-GRPO-gsm8k2
Text Generation
•
Updated
Feb 9
•
11
saswatach/Qwen2-1.5B-GRPO_1024
Updated
Feb 10
hwang595/Qwen2.5-1.5B-Open-R1-GRPO
Text Generation
•
Updated
Feb 10
•
10
skzxjus/Qwen-2.5-7B-Simple-RL
Text Generation
•
Updated
Feb 11
•
12
bruel/Qwen2.5-1.5B-Open-R1-GRPO
Text Generation
•
Updated
Feb 12
•
22
bobzy/Qwen2-0.5B-GRPO-test
Updated
Feb 10
Kira-wang/qwen-2.5-3b-r1-math-7k-v2-boxed-base
Text Generation
•
Updated
Feb 10
•
8
JeffP111/Qwen-2.5-3B-Simple-RL
Text Generation
•
Updated
Feb 11
•
13
nzy123/Qwen2.5-1.5B-Open-R1-GRPO
Text Generation
•
Updated
Feb 20
•
7
dddfffvfff/Qwen2.5-1.5B-Open-R1-GRPO
Text Generation
•
Updated
Feb 14
•
6
AlistairPullen/llama-3.2-3b-grpo
Text Generation
•
Updated
Feb 11
•
6
Thomas-Chou/Qwen2.5-1.5B-Open-R1-GRPO
Text Generation
•
Updated
Feb 10
•
6
yanivnaor/qwen-2.5-3b-r1-countdown
Text Generation
•
Updated
Feb 12
•
7
princepride/Qwen-2.5-7B-Simple-RL-Test
Text Generation
•
Updated
Feb 12
•
41
Ousso1117/GRPO-meta-Llama-3.2-1B-meta-Llama-3.2-1B-mrd3-sum
Updated
Feb 19
Ousso1117/GRPO-meta-Llama-3.2-3B-meta-Llama-3.2-3B-mrd3-sum
Updated
Feb 19
Ousso1117/GRPO-SFT-meta-Llama-3.2-1B-meta-Llama-3.2-1B-mrd3-sum
Updated
Feb 19
Ousso1117/GRPO-meta-Llama-2-7B-meta-Llama-2-7B-mrd3-sum
Updated
Feb 12
Ousso1117/GRPO-meta-Llama-3.1-8B-meta-Llama-3.1-8B-mrd3-sum
Updated
Feb 19
Ousso1117/GRPO-SFT-meta-Llama-2-7B-meta-Llama-2-7B-mrd3-sum
Updated
Feb 19
Ousso1117/GRPO-SFT-meta-Llama-3.1-8B-meta-Llama-3.1-8B-mrd3-sum
Updated
Feb 19
Dongwei/Qwen-2.5-7B_Base_Math_smalllr_longer
Text Generation
•
Updated
Feb 11
•
12
Ousso1117/GRPO-SFT-meta-Llama-3.2-3B-meta-Llama-3.2-3B-mrd3-sum
Updated
Feb 19
yolay/Qwen2.5-1.5B-Open-R1-GRPO
Text Generation
•
Updated
Feb 12
•
19
jlchen-c/Qwen-2.5-7B-Simple-RL
Text Generation
•
Updated
12 days ago
•
19
GISwarm/Qwen2-0.5B-GRPO-test
Updated
Feb 11
ibndias/Qwen2.5-1.5B-Open-R1-GRPO
Text Generation
•
Updated
Feb 12
•
8
allendou/Qwen2.5-0.5B-Open-R1-GRPO
Text Generation
•
Updated
Feb 21
•
12
Previous
1
...
5
6
7
8
9
...
43
Next