Trained with unsloth on just 250 steps (resource constraints) on GSM8K to add reasoning abilities to Qwen2.5-3B (smaller model because resources)
Sarthak Thakur
sarthak247
AI & ML interests
None yet
Recent Activity
new activity
3 days ago
sarthak247/Wan2.1-T2V-1.3B-nf4:Question
upvoted
a
paper
3 days ago
R1-Searcher: Incentivizing the Search Capability in LLMs via
Reinforcement Learning
Organizations
Collections
1
models
5

sarthak247/Wan2.1-T2V-1.3B-nf4
Text-to-Video
•
Updated
•
240
•
3

sarthak247/qwen2.5-grpo-gsm8k-250steps-gguf
Updated
•
99

sarthak247/qwen2.5-grpo-gsm8k-250steps-lora-adapters
Updated

sarthak247/qwen2.5-grpo-gsm8k-250steps-fp16
Text Generation
•
Updated
•
13

sarthak247/codellama-7b-humaneval-java-fim
Updated
•
9