Fine-tuning Large Language Models with Sequential Instructions Paper • 2403.07794 • Published Mar 12, 2024
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models Paper • 2502.07346 • Published Feb 11, 2025 • 53
CHARM: Calibrating Reward Models With Chatbot Arena Scores Paper • 2504.10045 • Published Apr 14, 2025
LNE-Blocking: An Efficient Framework for Contamination Mitigation Evaluation on Large Language Models Paper • 2509.15218 • Published Sep 18, 2025 • 1
Chain-of-Symbol Prompting Elicits Planning in Large Langauge Models Paper • 2305.10276 • Published May 17, 2023
QueST: Incentivizing LLMs to Generate Difficult Problems Paper • 2510.17715 • Published Oct 20, 2025 • 33