rl-rag/qwen3-8B-sft-mix-v20250921-plus-v20251001-onpolicy-rs-longform_0921 Text Generation • 8B • Updated 3 days ago • 144
rl-rag/qwen3-8b-base-combined-sft-training-data-v20250824_MiroSystemPrompt Text Generation • 8B • Updated Sep 2 • 4
rl-rag/qwen3-8b-combined-sft-training-data-v20250824_MiroSystemPrompt Text Generation • 8B • Updated Sep 2 • 8
rl-rag/qwen3-4b-it-combined-sft-training-data-v20250824_MiroSystemPrompt Text Generation • 4B • Updated Sep 2 • 6
rl-rag/qwen2.5-7b-combined-sft-training-data-v20250824_MiroSystemPrompt Text Generation • 8B • Updated Sep 2 • 4