Dyve: Thinking Fast and Slow for Dynamic Process Verification
Abstract
We present Dyve, a dynamic process verifier that enhances reasoning error detection in large language models by integrating fast and slow thinking, inspired by Kahneman's Systems Theory. Dyve adaptively applies immediate token-level confirmation System 1 for straightforward steps and comprehensive analysis System 2 for complex ones. Leveraging a novel step-wise consensus-filtered process supervision technique, combining Monte Carlo estimation with LLM based evaluation, Dyve curates high-quality supervision signals from noisy data. Experimental results on ProcessBench and the MATH dataset confirm that Dyve significantly outperforms existing process-based verifiers and boosts performance in Best-of-N settings.
Community
Large language models (LLMs) excel at complex reasoning but struggle with reliable step-by-step verification. Current methods face a trade-off: rapid "System 1" binary checks lack depth, while thorough "System 2" analyses are slow and computationally costly. To bridge this gap, we present Dyve, a dynamic verifier that adaptively combines fast token-level confirmation (System 1) for simple steps and deep analysis (System 2) for complex ones, inspired by Kahneman’s dual-process theory.
Dyve leverages a novel step-wise consensus-filtered supervision technique to train high-quality verifiers from noisy data. By generating diverse reasoning traces via Monte Carlo rollouts, filtering them with an LLM-as-a-Judge, and performing granular step-level error detection, Dyve distills 117K high-quality examples from 1.2M noisy samples. This approach ensures robust supervision while maintaining efficiency.
Evaluated on ProcessBench and MATH, Dyve outperforms existing verifiers, achieving state-of-the-art F1 scores (e.g., 68.5 on GSM8K and 58.3 on MATH) and generalizing effectively to Olympiad-level problems. When integrated with proposer LLMs in Best-of-N settings, Dyve boosts accuracy to 95.5% (N=8), demonstrating superior synergy between dynamic verification and reasoning generation. Dyve also balances speed and precision, offering significant faster inference than pure System 2 models.
Our work advances reliable AI reasoning by ensuring systematic, step-wise validation. Code, data, and models are open-sourced to support further research in trustworthy LLM development.
paper: https://arxiv.org/pdf/2502.11157
code: https://github.com/staymylove/Dyve
model: Jianyuan1/deepseek-r1-14b-cot-math-reasoning-full
data: Jianyuan1/cot-data
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- LLM2: Let Large Language Models Harness System 2 Reasoning (2024)
- The Lessons of Developing Process Reward Models in Mathematical Reasoning (2025)
- DeepRAG: Thinking to Retrieval Step by Step for Large Language Models (2025)
- Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback (2025)
- Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models (2025)
- Zero-Shot Verification-guided Chain of Thoughts (2025)
- Virgo: A Preliminary Exploration on Reproducing o1-like MLLM (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper