Model Card: Assisting Mathematical Formalization with A Learning-based Premise Retriever

Model Description

This model is designed for premise retrieval in Lean, based on the state representation of Lean. The model follows the architecture described in the paper:

Assisting Mathematical Formalization with A Learning-based Premise Retriever

The model implementation and code are available at:

GitHub Repository

Try our model

Available Models

The repository provides several models for different tasks:

Pre-trained Models

Located in the Pretrain_Model folder:

410_stable_random: Pre-trained model for retrieval fine-tuning.
410_stable_random_1024: Pre-trained model for reranking fine-tuning.

Fine-tuned Models

Retrieval Fine-tuned Models

Located in the Finetune_Model folder:

Models fine-tuned on specific datasets for retrieval tasks.

Rerank Fine-tuned Models

Located in the Rerank folder:

Models fine-tuned on specific datasets for reranking tasks.

Citation

If you use this model, please cite the following paper:

@misc{tao2025assistingmathematicalformalizationlearningbased,
      title={Assisting Mathematical Formalization with A Learning-based Premise Retriever}, 
      author={Yicheng Tao and Haotian Liu and Shanwen Wang and Hongteng Xu},
      year={2025},
      eprint={2501.13959},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.13959}, 
}