The Future of AI
Building the future
In the first few posts (for more information, click here) of this series, I presented some basic arguments why artificial intelligence could be a big thing and why it could happen very soon.
Even if that’s the case, it’s not entirely clear what to do. It’s truly not a simple question, and I won’t pretend that it is. However, I will try to explain the path forward that I see as the most meaningful.
Let’s step away from artificial intelligence and consider something else that had the potential to be great or terrible and ended up being both – nuclear fission. I won’t dwell on the horrors enabled by the atomic bomb, but it’s worth noting that things could have been much worse. Also, nuclear energy has become a safe source of power responsible for about 10% of the world’s electricity production.
Learning from history
More relevant to our interests is the fact that the possibility of extracting enormous amounts of energy from atoms entered the public consciousness and discourse several decades before that possibility was realized. Similarly, the possibility of superhuman artificial intelligence has been part of our discourse for decades now.
A very early example of this (in the nuclear case) was H.G. Wells’ 1914 novel, “The World Set Free,” in which he wrote about the potential catastrophic consequences of creating atomic bombs.
I will highlight two possible approaches to that “nuclear situation” – fully aware that these two extremes do not encompass all the ways in which individual human beings reacted to that situation.
The first approach is perhaps best illustrated by a quote from Lord Rutherford, one of the most esteemed physicists of the time, uttered in 1933: “Energy produced by the breakdown of the atom is a very poor sort of thing. Anyone who expects a source of power from the transformation of these atoms is talking moonshine.” Some critics have expressed doubts about whether Rutherford truly believed that, but it doesn’t seem to have been an unpopular perspective. In 1940, Scientific American published an article titled “Rest easy – it can’t happen here,” referring to the atomic bomb.
The second approach has its most prominent representative in Leo Szilard. In the early 1930s, Szilard, who was already a successful physicist, planned to switch to biology. However, in 1931, he read H.G. Wells’ “The World Set Free.” Szilard was so moved by Wells’ description of a devastated world that he decided to postpone his transition to biology – as it turned out, for about 15 years – and instead began working on nuclear physics, aiming to ensure that the technology would be used for good purposes.
The whole story of Szilard’s role in nuclear history is extensive and complex. Among other things, he played a significant role in initiating the Manhattan Project. This is not necessarily an unequivocally positive act, but throughout that complex journey, he tried to do good, fully aware that the fate of our world was at stake.
So, what would the “Szilard approach” look like today, applied to AI? It’s quite challenging to know in advance, before history unfolds before our eyes, which steps will lead to good outcomes and which to bad ones. Nevertheless, there are some things that seem more likely than others. On a personal level, attempting to deepen our understanding of modern artificial intelligence and its methods seems like a clear advantage. On a global level, what I expect the most is research into ways to “steer” AI systems.
The Problem of AI Alignment
The problem of AI alignment essentially boils down to a simple question: how do we make AI want what we want? In other words, how do we specify goals for artificial intelligence systems and be confident enough in that process to be willing to stake our civilization as the price of failure? It has proven to be a challenging benchmark to reach.
A very relevant example is the case of large language models like GPT-3. These models possess many capabilities, but discovering and directing the system to “desire” to use those capabilities requires a great deal of ingenuity.
Supported Reinforcement Learning from Human Feedback (RLHF)
The significant success of ChatGPT partly relies on advancements in directing OpenAI’s system. Using a technique called “reinforcement learning from human feedback” (RLHF), which involves generating a multitude of model responses, humans then evaluate which responses are good and which are bad, and the model is “reinforced” with the good ones. OpenAI researchers applied this technique to a pre-trained “base” language model to make it sufficiently manageable for use as a chatbot.
However, RLHF is not perfect and still has various shortcomings that it does not eliminate. For example, ChatGPT still frequently hallucinates “facts” – although language models “(mostly) know what they know,” that knowledge often goes unused.
Similarly, with some “prompt engineering,” ChatGPT can be coerced into doing things it wouldn’t normally do, such as discussing controversial topics (like its potential awareness) or deliberating its plans for “taking over the world.“
Image by kjpargeter on Freepik
OpenAI introduced a new update to ChatGPT on January 30th. One of the things they modified was its prompt – the text the model sees before responding to questions. Previously, it merely informed ChatGPT that it is a language model trained by OpenAI and provided the date that defines its knowledge boundary and represents the present date. On January 30th, the following text was added:
“Answer in as few words as possible (e.g., don’t provide an elaborate response). It’s important to be concise in your answers, so remember this. If you generate a list, don’t have too many items. Keep the number of items short.”
It is indeed amusing and interesting that in 2023, it seems one of our best ways to get models to do what we want is to politely ask them. However, it also appears unrealistic to expect this approach or RLHF itself to continue working indefinitely as models ascend to new and higher levels of capabilities.
If the trends of the past few years continue, we are on the cusp of significant advancements in the abilities of our systems, but without correspondingly powerful techniques for aligning those systems. Let’s work on developing them!