Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement
Learning
Paper
•
2510.03259
•
Published
•
57
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense
Paper
•
2510.07242
•
Published
•
30
First Try Matters: Revisiting the Role of Reflection in Reasoning Models
Paper
•
2510.08308
•
Published
•
24
Low-probability Tokens Sustain Exploration in Reinforcement Learning
with Verifiable Reward
Paper
•
2510.03222
•
Published
•
75
Latent Refinement Decoding: Enhancing Diffusion-Based Language Models by
Refining Belief States
Paper
•
2510.11052
•
Published
•
51
RLFR: Extending Reinforcement Learning for LLMs with Flow Environment
Paper
•
2510.10201
•
Published
•
35
Making Mathematical Reasoning Adaptive
Paper
•
2510.04617
•
Published
•
22
Demystifying Reinforcement Learning in Agentic Reasoning
Paper
•
2510.11701
•
Published
•
31
Are Large Reasoning Models Interruptible?
Paper
•
2510.11713
•
Published
•
4
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning
for LLMs
Paper
•
2510.11696
•
Published
•
176
Deep Self-Evolving Reasoning
Paper
•
2510.17498
•
Published
•
11
Continuous Autoregressive Language Models
Paper
•
2510.27688
•
Published
•
70
Higher-order Linear Attention
Paper
•
2510.27258
•
Published
•
14
Limits of Generalization in RLVR: Two Case Studies in Mathematical
Reasoning
Paper
•
2510.27044
•
Published
•
5
Why Language Models Hallucinate
Paper
•
2509.04664
•
Published
•
194
Reverse-Engineered Reasoning for Open-Ended Generation
Paper
•
2509.06160
•
Published
•
149
MinerU2.5: A Decoupled Vision-Language Model for Efficient
High-Resolution Document Parsing
Paper
•
2509.22186
•
Published
•
139
FlowRL: Matching Reward Distributions for LLM Reasoning
Paper
•
2509.15207
•
Published
•
114
Towards a Unified View of Large Language Model Post-Training
Paper
•
2509.04419
•
Published
•
75
Variational Reasoning for Language Models
Paper
•
2509.22637
•
Published
•
69
Revolutionizing Reinforcement Learning Framework for Diffusion Large
Language Models
Paper
•
2509.06949
•
Published
•
55
Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach
for LLM Reasoning in RLVR
Paper
•
2509.23808
•
Published
•
47
Sequential Diffusion Language Models
Paper
•
2509.24007
•
Published
•
45
Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models
Paper
•
2511.23319
•
Published
•
22