Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | 
         @@ -24,7 +24,7 @@ library_name: transformers 
     | 
|
| 24 | 
         | 
| 25 | 
         
             
            We introduce Goedel-Prover-V2, an open-source language model series that sets a new state-of-the-art in automated formal proof generation. Built on the standard expert iteration and reinforcement learning pipeline, our approach incorporates three key innovations: (1) <strong>Scaffolded data synthesis</strong>: We generate synthetic proof tasks of increasing difficulty to progressively train the model, enabling it to master increasingly complex theorems; (2) <strong>Verifier-guided self-correction</strong>: The model learns to iteratively revise its own proofs by leveraging feedback from Lean’s compiler, closely mimicking how humans refine their work; (3) <strong>Model averaging</strong>: We combine multiple model checkpoints to improve robustness and overall performance.
         
     | 
| 26 | 
         | 
| 27 | 
         
            -
            Our small model, Goedel-Prover-V2-8B, reaches  
     | 
| 28 | 
         | 
| 29 | 
         
             
            ## 2. Benchmark Performance
         
     | 
| 30 | 
         | 
| 
         | 
|
| 24 | 
         | 
| 25 | 
         
             
            We introduce Goedel-Prover-V2, an open-source language model series that sets a new state-of-the-art in automated formal proof generation. Built on the standard expert iteration and reinforcement learning pipeline, our approach incorporates three key innovations: (1) <strong>Scaffolded data synthesis</strong>: We generate synthetic proof tasks of increasing difficulty to progressively train the model, enabling it to master increasingly complex theorems; (2) <strong>Verifier-guided self-correction</strong>: The model learns to iteratively revise its own proofs by leveraging feedback from Lean’s compiler, closely mimicking how humans refine their work; (3) <strong>Model averaging</strong>: We combine multiple model checkpoints to improve robustness and overall performance.
         
     | 
| 26 | 
         | 
| 27 | 
         
            +
            Our small model, Goedel-Prover-V2-8B, reaches 84.6% on MiniF2F test set at Pass@32, matching the performance of prior state-of-the-art DeepSeek-Prover-V2-671B while being nearly 100 times smaller in model size.  Our flagship model, Goedel-Prover-V2-32B, achieves 88.0% on MiniF2F at Pass@32 on standard mode and 90.4% on self-correction mode, outperforming prior SOTA DeepSeek-Prover-V2-671B and concurrent work Kimina-Prover-72B by a large margin. Additionaly, our flagship model with self-correction solves 64 problems on PutnamBench at Pass@64, securing the 1st on the leaderboard surpassing DeepSeek-Prover-V2-671B's record of solving 47 problems by Pass@1024.
         
     | 
| 28 | 
         | 
| 29 | 
         
             
            ## 2. Benchmark Performance
         
     | 
| 30 | 
         |