SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors Paper • 2502.11167 • Published Feb 16 • 10
Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving Paper • 2502.07640 • Published Feb 11 • 8
Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation Paper • 2411.00412 • Published Nov 1, 2024 • 10
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks Paper • 2410.10563 • Published Oct 14, 2024 • 38