Papers
arxiv:2507.22457

What is an "Abstract Reasoner"? Revisiting Experiments and Arguments about Large Language Models

Published on Jul 30
Authors:
,
,

Abstract

LLMs perform poorly in zero-shot settings but can achieve near-perfect performance with parameter tuning, which does not transfer across datasets, prompting a reevaluation of what constitutes abstract reasoning.

AI-generated summary

Recent work has argued that large language models (LLMs) are not "abstract reasoners", citing their poor zero-shot performance on a variety of challenging tasks as evidence. We revisit these experiments in order to add nuance to the claim. First, we show that while LLMs indeed perform poorly in a zero-shot setting, even tuning a small subset of parameters for input encoding can enable near-perfect performance. However, we also show that this finetuning does not necessarily transfer across datasets. We take this collection of empirical results as an invitation to (re-)open the discussion of what it means to be an "abstract reasoner", and why it matters whether LLMs fit the bill.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2507.22457 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2507.22457 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2507.22457 in a Space README.md to link it from this page.

Collections including this paper 1