The most famous non-definition in legal history is that of US Supreme Court Justice Potter Stewart, in 1964, who wrote: “I shall not today attempt further to define the kinds of material I understand to be embraced within that shorthand description, and perhaps I could never succeed in intelligibly doing so. But I know it when I see it.” (italics mine, and also everyone-else-whoever-quoted-it’s.)
This leaps to mind when considering the looming dread / hype / apotheosis called Artificial General Intelligence. There is a wide consensus that if/when we reach AGI, it will be a watershed (and/or catastrophic) moment for not just modern civilization but the entire history of humanity. More prosaically, hundreds of billions of dollars may ride on it; AGI is, or at least was, the contractual breakpoint at which Microsoft loses access to OpenAI’s models.
But is there a similarly wide consensus about what we’re actually talking about? Do the formal prescriptive definitions match the colloquial descriptive ones? Is everyone just excitedly talking past everyone else?
Everyone says … oh wait, these are quite different actually
Some formal definitions, and/or prominent non-definitions, of AGI:
“AI with a human level of cognitive function, including the ability to self-teach.” (Investopedia)
“A machine that possesses the ability to understand or learn any intellectual task that a human being can.” (Google)
“Software with human-like intelligence and the ability to self-teach.” (Amazon)
“A form of AI that possesses the ability to understand, learn and apply knowledge across a wide range of tasks and domains.” (Gartner)
“Highly autonomous systems that outperform humans at most economically valuable work.” (OpenAI charter)
“AI systems that can generate at least $100 billion in profits.” (OpenAI contract)
“an imprecise term that has gathered a lot of sci-fi baggage and hype. I prefer ‘powerful AI’ or ‘Expert-Level Science and Engineering’ which get at what I mean without the hype” (Anthropic, or at least Dario Amodei)
“a single unified software system that can … reliably pass a 2-hour, adversarial Turing test, assemble a 1:8 scale model of a Ferrari, and pass [a bunch of Q&A tests and interview problems] with a 90% average.” (Metaculus)
“We introduce six principles for a clear, operationalizable definition of AGI … with these principles in mind, we introduce our Levels of AGI ontology … Our ‘Competent AGI’ level is probably the best catch-all for many existing definitions of AGI … at least 50th percentile of skilled adults across a wide range of non-physical tasks, including metacognitive tasks like learning new skills.” (DeepMind)
It’s reasonable to look at this list and have these reactions, in order:
“oh, those are all pretty similar”
“wait, they’re actually quite different”
“wow, DeepMind wrote an entire academic paper to get to pretty much the same definition Investopedia-of-all-places tossed off in fifteen words.”
Riders of the Jagged Frontier
Some of the discontinuities are obvious. Gartner wants the AI to understand its knowledge. OpenAI, and only OpenAI, measures AI by economically valuable work (which has the huge advantage of being relatively tangible/measurable) while uniquely requiring it to be highly autonomous. DeepMind says 50th percentile of human ability while Amodei says Expert-Level. To us, at least, that’s a big difference!
But more concerning yet are all the carefully vague phrases: human-like, skilled adults, wide range. How human exactly? How skilled exactly? How wide exactly?
It’s well accepted that AI advances along a ‘jagged frontier,’ i.e. it gets very good at some things we associate with intelligence (e.g. LLMs writing code) with astonishing rapidity, while remaining hilariously bad at others (e.g. LLMs guessing how many ‘r’s are in the word ‘strawberry’) with remarkable persistence.
I’ve argued before that we are “prone to a Fundamental Extrapolation Error, in that we extrapolate assuming modern AI is on a path to intelligence like ours.” In other words, people look at the jagged frontier and conclude its valleys mean AI is ‘really’ dumb/fake/fraudulent, or, conversely, its peaks mean the things it’s bad at will be fixed Any Day Now. Terms like human-like or skilled adults only exacerbate this tendency.
What if our g is not their g?
It may be true that in humans most cognitive abilities are fairly well correlated, and when we talk about intelligence we are ultimately usually talking about a single-dimensional variable, often called g. But even if we stipulate for the sake of argument that this is true for human beings … it does not mean it’s true for all beings.
Suppose g was very well correlated with throwing a ball with great precision. Makes some intuitive sense, right? Hand-eye coordination is mostly in the mind! That’s a lot of complex calculus in a split second, especially for e.g. a curve ball! If that were the case, Einstein would also have been a superstar baseball pitcher, and we’d talk about how throwing precision is obviously a member of the “intelligence cluster” of human traits which also includes memory, pattern-matching, ‘advanced’ mathematics and language and music theory, etc. But we don’t. Is it foreordained as a law of the universe that we don’t?
It’s true that Moravec’s Paradox might suggest so … but that so-called paradox is itself a far more jagged frontier than we may appreciate, in an age when full-size humanoid robots can do perfect backflips.
Ultimately, if you partition human intelligence into N separate ‘peaks’ — the Jagged Range that AI researchers are attempting to scale — then on some K of those peaks, modern AI (especially LLMs) can or will soon exceed human capacity, and on N-K they can’t or will not soon. Must K equal N for us to achieve AGI? Seems unlikely. So the question “what is AGI?” if measured purely by comparison to human performance across a wide range of tasks, is simply: what fraction of N does K need to be?
That … doesn’t sound very satisfying or useful, does it?
AGC, or, if you’re so smart, why ain’t you doing something?
This dissatisfaction is, in part, because AGI—descriptively, in common parlance, when we talk about it casually—is only partly about intelligence. It is also largely about capabilities. AGI is in large part really AGC.
Intelligence is important to any given capability, but rarely either necessary or sufficient. For sufficiency, consider the many child prodigies who go nowhere; consider the need for I/O and/or physical limbs; consider a person who is 10x smarter but 1000x slower at a given cognitive task. For necessariness, remember that many skills were deemed cognitive ones, requiring intelligence—such as mathematicaly calculations, or rapid sorting—until traditional computers existed.
A great deal of ‘AGI’ confusion stems from vagueness about whether we are talking about process or outcome. This is the appeal of the OpenAI definitions and the Amodei non-definition. They focus on capabilities and outcomes, not intelligence and process. They do not require AI’s particular set of cognitive strengths and weaknesses to closely match our own in order to be considered sufficiently ‘general.’
Never taxonomize when you can simply ride piggyback
What’s more, plenty of taxonomies of human outputs — or at least occupations — already exist. On Wikipedia; at the Bureau of Labor Statistics; on www.gov.uk and elsewhere. Now, it’s true that comparing an individual human job to an AI capability is apples to pomegranates. Individual jobs are implicitly human-shaped, and assume baseline human limitations and capabilities, whereas the territory of AI requirements, limitations, and possibilities is enormously different.
But you can, in principle, measure the ROI of both. Consider one of the most handwavey of all jobs; that of an architect. We cannot simply imagine an architect’s job shared between a human and an AI on alternate days. Humans are proactive and self-guided, go into offices and to conferences and socialize with fellow architects, work forty hours a week for a fixed salary, etc. The entire concept of an architect’s job has to be entirely broken down, restructured, and reframed to even consider having an AI do even part of it.
…But we don’t actually have to care about that to define AGI. We merely have to measure the inputs and outputs in strict economic terms. If you pay a human architect $100,000 a year, and you also construct an AI-architect system and pour $100,000 of compute into over a year, how comparable are the outputs?
That is, again at least in principle, an answerable question. How do you quantify an architect’s output? Conveniently, economists try to quantify such things all the time; it is a cornerstone of the profession. Here are a few examples. If we can calculate the return—call it the “return on salary-equivalent tokens,” for LLMs—for architecture, we can certainly calculate it for more concrete fields like software.
What I talk about when I talk about AGI
And that, at long last, brings us to my definition of AGI. It is boring. It is quantitative. It focuses on outcomes and capabilities, not processes and intelligence. And that is as it should be, because it has become very clear that if you give human beings an excuse to think in a woolly, fuzzy, aspirational way about artificial intelligence, they will latch onto it with both hands and both feet as well. To wit:
AGI is when 2/3 of a representative sample of professional economists agree that the return on AI inference investment is within 25% of the average return on the equivalent average human salary for at least 50% of occupations, as taxonomized by the US Bureau of Labor Statistics, for which interaction with the physical world is not essential.
(We can drop that last clause when we get AI-driven humanoid robots…)
Do I expect the world to adopt this definition? I sure do not. But I do think it is a useful lens through which to view other definitions of AGI, as the hype / scorn / dismay / excitement spirals up to ever greater and more dizzying heights.