Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training Paper • 2509.03403 • Published Sep 3, 2025 • 22
RAG-RL: Advancing Retrieval-Augmented Generation via RL and Curriculum Learning Paper • 2503.12759 • Published Mar 17, 2025 • 1
Self-rewarding correction for mathematical reasoning Paper • 2502.19613 • Published Feb 26, 2025 • 82