Edit Models filters

Inference Providers

Nebius AI Studio

HF Inference API

Misc

arxiv: 2402.03300

Inference Endpoints

text-generation-inference

AutoTrain Compatible

4-bit precision

Carbon Emissions

8-bit precision

Misc with no match

text-embeddings-inference

Mixture of Experts

Models

743

Full-text search

Active filters: 2402.03300

yooneo/qwen-1.5b-r1-aha

spinech/qwen2.5-3b-r1-rearc-stage1

Text Generation • Updated Feb 1 • 46

hyunw3/qwen-2.5-0.5b-r1-countdown

Text Generation • Updated Feb 1 • 36

hyunw3/qwen-2.5-0.5b-r1-countdown_lr1.0e-6

Text Generation • Updated Feb 1 • 6

mgaimm/qwen-2.5-3b-r1-countdown

Text Generation • Updated Feb 1 • 23

tuyentx/qwen-2.5-3b-r1-countdown

Text Generation • Updated Feb 2 • 4

pablo-chocobar/qwen-2.5-3b-r1-countdown

Text Generation • Updated Feb 3 • 7

Julian-Sheeper/Qwen2.5-1.5B-Open-R1-GRPO

Text Generation • Updated Feb 2 • 6

pullpull/qwen-2.5-3b-r1-countdown

Text Generation • Updated Feb 2 • 6

justinj92/Qwen2.5-1.5B-Thinking

Text Generation • Updated Feb 4 • 129 • 4

spinech/qwen2.5-3b-r1-arc-train

Text Generation • Updated Feb 3 • 46

howardzhou/Qwen2.5-3B-Open-R1-GRPO

Text Generation • Updated Feb 5 • 20

jainamit/qwen-2.5-3b-r1-countdown

Text Generation • Updated Feb 6 • 52

GitBag/Qwen2.5-1.5B-Open-R1-GRPO

Text Generation • Updated Feb 4 • 7

Dongwei/Qwen-2.5-7B

Text Generation • Updated Feb 3 • 7

spinech/qwen2.5-3b-r1-arc-train-synthetic

Text Generation • Updated Feb 4 • 17

laolaorkk/Qwen2.5-1.5B-R1-GRPO-debug

Text Generation • Updated Feb 6 • 31

Dongwei/DeepSeek-R1-Distill-Qwen-7B-GRPO_Math

Text Generation • Updated Feb 4 • 22

Dongwei/Qwen-2.5-7B_Math

Text Generation • Updated Feb 4 • 9

Dongwei/Qwen2.5-1.5B-Open-R1-GRPO_Math

Text Generation • Updated Feb 3 • 134

Dongwei/DeepSeek-R1-Distill-Qwen-1.5B-GRPO_Math

Text Generation • Updated Feb 3 • 79

skzxjus/Qwen2.5-7B-Open-R1-GRPO

Text Generation • Updated Feb 8 • 56

AndreasX1206/Qwen2-0.5B-countdown

Text Generation • Updated Feb 4 • 9

alicogniai/Qwen2.5-1.5B-Open-R1-GRPO

Text Generation • Updated 30 days ago • 5

ununtrium/Qwen2.5-1.5B-Open-R1-GRPO

Text Generation • Updated Feb 11 • 8

yuta0x89/llmjp13b-numinacot-epoch2-GRPO

Text Generation • Updated Feb 11 • 83

yeshsurya/Qwen2.5-7B-Math-with_50stepGRPO

Text Generation • Updated Feb 12 • 6

hyunw3/qwen-2.5-0.5b-r1-countdown_lr5e-6

Text Generation • Updated Feb 5 • 235

AlistairPullen/Llama-3.1-8b-Instruct-GRPO-fine-tuned-lora

Dongwei/DeepSeek-R1-Distill-Qwen-7B-GRPO_Math_lowlr

Text Generation • Updated Feb 4 • 14