Chain-of-Thought Reasoning is a Policy Improvement Operator Paper • 2309.08589 • Published Sep 15, 2023 • 1
Q-Probe: A Lightweight Approach to Reward Maximization for Language Models Paper • 2402.14688 • Published Feb 22, 2024
NATURAL PLAN: Benchmarking LLMs on Natural Language Planning Paper • 2406.04520 • Published Jun 6, 2024 • 14
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet Paper • 2408.15221 • Published Aug 27, 2024
A Careful Examination of Large Language Model Performance on Grade School Arithmetic Paper • 2405.00332 • Published May 1, 2024 • 32
A Careful Examination of Large Language Model Performance on Grade School Arithmetic Paper • 2405.00332 • Published May 1, 2024 • 32