Dec 09 2024
Have We Achieved General AI
As I predicted the controversy over whether or not we have achieved general AI will likely exist for a long time before there is a consensus that we have. The latest round of this controversy comes from Vahid Kazemi from OpenAI. He posted on X:
“In my opinion we have already achieved AGI and it’s even more clear with O1. We have not achieved “better than any human at any task” but what we have is “better than most humans at most tasks”. Some say LLMs only know how to follow a recipe. Firstly, no one can really explain what a trillion parameter deep neural net can learn. But even if you believe that, the whole scientific method can be summarized as a recipe: observe, hypothesize, and verify. Good scientists can produce better hypothesis based on their intuition, but that intuition itself was built by many trial and errors. There’s nothing that can’t be learned with examples.”
I will set aside the possibility that this is all for publicity of OpenAI’s newest O1 platform. Taken at face value – what is the claim being made here? I actually am not sure (part of the problem of short form venues like X). In order to say whether or not OpenAI O1 platform qualified as an artificial general intelligence (AGI) we need to operationally define what an AGI is. Right away, we get deep into the weeds, but here is a basic definition: “Artificial general intelligence (AGI) is a type of artificial intelligence (AI) that matches or surpasses human cognitive capabilities across a wide range of cognitive tasks. This contrasts with narrow AI, which is limited to specific tasks.”
That may seem straightforward, but it is highly problematic for many reasons. Scientific American has a good discussion of the issues here. But at it’s core two features pop up regularly in various definitions of general AI – the AI has to have wide-ranging abilities, and it has to equal or surpass human level cognitive function. There is a discussion about whether or not how the AI achieves its ends matters or should matter. Does it matter if the AI is truly thinking or understanding? Does it matter if the AI is self-aware or sentient? Does the output have to represent true originality or creativity?
Kazemi puts his nickel down on how he operationally defines general AI – “better than most humans at most tasks”. As if often the case, one has to frame such claims as “If you define X this way, then this is X.” So, if you define AGI as being better than most humans at most tasks, then Kazemi’s claims are somewhat reasonable. There is still a lot to debate, but at least we have some clear parameters. This definition also eliminated the thorny question of understanding and awareness.
But not everyone agrees with this definition. There are still many experts who contend that the modern LLM’s are still just really good autocompletes. They are language prediction algorithms that simulate thought through simulating language, but are not capable of true thought, understanding, or creativity. What they are great at is sifting through massive amounts of data and finding patterns, and then regenerating those patterns.
This is not a mere discussion of “how” LLMs function but gets to the core of whether or not they are “better” than humans at what they do. I think the primary argument against LLMs being better than humans is that they function by using the output of humans. Kazemi essentially says this is just how they learn, they are following a recipe like people do. But I think that dodges the key question.
Let’s take art as an example. Humans create art, and some artists are truly creative and can bring into existence new and unique works. There are always influences and context, but there is also true creativity. AI art does not do this. It sifts through the work of humans, learns the patterns, and then generates imitations from those patterns. Since AI does not experience existence, it cannot draw upon experience or emotions or the feeling of what it is to be a human in order to manifest artistic creativity. It just regurgitates the work of humans. So how can we say that AI is better than humans at art when it is completely dependent on humans for what it does? The same is true for everything LLMs do, but it is just more obvious when it comes to art.
I am not denigrating LLMs or any modern AI as extremely useful tools. They are powerful, and fast, and can accomplish many great tasks. They are accelerating the rate of scientific research in many areas. They can improve the practice of medicine. They can help us control the tsunami of data that we are drowning ourselves in. And yes, they can do a lot of different tasks.
Perhaps it is easier to define what is not AGI. A chess-playing computer is not AGI, as it is programmed to do one task. In fact, the term AGI was developed by programmers to distinguish this effort from the crop of narrow AI applications that were popping up, like Chess and Go players. But is everything that is not a very narrow AI an AGI? Seems like we need more highly specific terms.
OpenAI and other LLMs are more than just the narrow AIs of old. But they are not thinking machines, nor do they have human-level intelligence. They are also certainly not self-aware. I think Kazemi’s point about a trillion parameter deep neural net misses the point. Sure, we don’t know exactly what it is doing, but we know what it is not doing, and we can infer from it’s output and also how it’s programmed the general way that it is accomplishing its outcome. There is also the fact that LLMs are still “brittle” – a term that refers to the fact that narrow AIs can be easily “broken” when they are pushed beyond their parameters. It’s not hard to throw an LLM off its game and push the limits of it’s ability. It still has not true thinking or understanding, and this makes it brittle.
For that reason I don’t think that LLMs have achieved AGI. But I could be wrong, and even if we are not there yet we may be really close. But regardless, I think we need to go back to the drawing board, look at what we currently have in terms of AI, and experts need to come up with perhaps new more specific operational definitions. We do this in medicine all the time – as our knowledge evolves, sometimes we need for experts to get together and revamp diagnostic definitions and make up new diagnoses to reflect that knowledge. Perhaps ANI and AGI are not enough.
To me LLMs seems like a multi-purpose ANI, and perhaps that is a good definition. Either “AGI” needs to be reserved for an AI that can truly derive new knowledge from a general understanding of the world, or we “downgrade” the term “AGI” to refer to what LLMs currently are (multi-purpose but otherwise narrow) and come up with a new term for true human-level thinking and understanding.
What’s exciting (and for some scary) is that AIs are advancing quickly enough to force a reconsideration of our definitions of what AIs actually are.