Papers
arxiv:2502.18858

Evaluating Intelligence via Trial and Error

Published on Feb 26
· Submitted by jingtao on Mar 12
Authors:
,
,
,
,
,
,
,
,

Abstract

Intelligence is a crucial trait for species to find solutions within a limited number of trial-and-error attempts. Building on this idea, we introduce Survival Game as a framework to evaluate intelligence based on the number of failed attempts in a trial-and-error process. Fewer failures indicate higher intelligence. When the expectation and variance of failure counts are both finite, it signals the ability to consistently find solutions to new challenges, which we define as the Autonomous Level of intelligence. Using Survival Game, we comprehensively evaluate existing AI systems. Our results show that while AI systems achieve the Autonomous Level in simple tasks, they are still far from it in more complex tasks, such as vision, search, recommendation, and language. While scaling current AI technologies might help, this would come at an astronomical cost. Projections suggest that achieving the Autonomous Level for general tasks would require 10^{26} parameters. To put this into perspective, loading such a massive model requires so many H100 GPUs that their total value is 10^{7} times that of Apple Inc.'s market value. Even with Moore's Law, supporting such a parameter scale would take 70 years. This staggering cost highlights the complexity of human tasks and the inadequacies of current AI technologies. To further investigate this phenomenon, we conduct a theoretical analysis of Survival Game and its experimental results. Our findings suggest that human tasks possess a criticality property. As a result, Autonomous Level requires a deep understanding of the task's underlying mechanisms. Current AI systems, however, do not fully grasp these mechanisms and instead rely on superficial mimicry, making it difficult for them to reach an autonomous level. We believe Survival Game can not only guide the future development of AI but also offer profound insights into human intelligence.

Community

Paper author Paper submitter

A new evaluation framework, Survival Game, to evaluate machine intelligence. It evaluates the number of errors when AI finds solutions in a trial-and-error process. Fewer errors -> more intelligence.

Three Levels of Intelligence based on the convergence of error counts:
⚡ Limited – divergent expectation
⚡ Capable – divergent variance
⚡ Autonomous – convergent expectation and variance.

Autonomous Level means that AI consistently finds the answer within finite errors – autonomous ability, no human oversight needed.

Results:

  • Most existing AI are at the Limited Level.
  • Larger models perform better, following a log-linear trend.
  • Achieving AGI would require unimaginable 10^26 parameters. Even with Moore's Law, such a parameter scale would take 70 years. 🤯

Underlying Reason - SOC theory:

  • Complexity of human tasks: Human tasks exhibit "criticality" property.
  • Inadequacy of today's AI: AI does not understand the real mechanisms.

截屏2025-03-12 12.08.37.png

That is an incredible definition of intelligence.

No more subjectivity, we now have an objective definition of intelligence.

This is seriously impressive

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2502.18858 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2502.18858 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2502.18858 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.