Blog ENG

The new era of AI

Lovre Pešut

Perhaps one of the few things about AI that everyone can agree about is that it is in the state of rapid change and expansion. Though the academic efforts stretch back to the middle of the last century, there is a tangible sense in which nothing much happened, at least in the sense of actually useful systems existing, until the last few years. AI couldn’t really tell apart dogs from cats, until some point in the last ten years. AI couldn’t flexibly converse in natural language on a basically unlimited amount of topics, until the last few years. AI, nor any human scientist or a group of scientists, could not crack the protein folding problem, until AI cracked it in the last couple of years.

Here’s a striking example of this rapid progress. This is an image that the best text-to-image model (an AI model which takes a certain text and generates an image based on it) of 2015 could generate if you asked it for a toilet sitting in a field of grass.

Mansimov, Elman, Emilio Parisotto, Jimmy Ba and Ruslan Salakhutdinov. “Generating Images from Captions with Attention.” CoRR abs/1511.02793 (2015)

And this is what the best model of 2022 can do:

AI Art with Midjourney

So what precipitated this?

If we aimed for maximum conciseness, we could certainly get away with just “deep learning”. Here’s a few other factors, though, to put this advance in context:

  1. Since 2012 the compute used to train the most powerful models has risen a billion times, with a doubling time of 3.4 months.
  2. Algorithms have also been exponentially getting better — for example, it takes 44 times less compute to train the most efficient 2020 model to the level of the best 2012 model, so in this case decreasing by a factor of 2 every 16 months.
  3. Compute has also been getting exponentially cheaper: FLOP/s per dollar (in GPUs, which are the staple of modern AI system training) has been doubling every 2.95 years.

The ability of exponential growth to fool human beings has been baked into one of the quintessential stories of human folly — the story of chess and rice, in which a king offers the inventor of chess any reward that he wants, and the inventor asks for a single grain of rice to be placed on the first square of the chessboard, two on the second, four on the third, eight on the fourth, et cetera. After hearing this, the king, thinking this a small price to pay for such a game, agrees immediately — resulting, of course, in the inventor owning all the rice. Given, then, that we have known this lesson for so long, it would perhaps be prudent not to ignore exponentials again, thinking that the tranquility of our present state is sure to stretch to the indefinite future.

Fruits of scale

It is important to note that our advancements in, say, image generation models were, for the most part, not the result of some deep insights that we’ve just got, in the last few years, about the essence of image generation, which we then put into our models. Yes, there were some new models which worked a bit better than old models, but for the most part these advances consisted in feeding deep learning models with gargantuan amounts of data — e.g. pairs of text and images — from which the model had to, “on its own”, deduce the essence of visual makeup of the world.

Other domains follow similar stories. The great advancements in natural language processing, say the success of the “GPT” series of models, did not come from thinking really deeply about the nature of language and then putting that nature into the machine. It came from shoving large amounts of text into the machine, hoping that the machine — by which we mean a deep learning model paired with something like stochastic gradient descent — would develop its own intuition for the language. And it does!

This trend suggests that development of AI might not be bounded with our “understanding of intelligence” or any such intangible. It might rather be that with increasing compute budgets we’re going to get increasing intelligence, and that’s going to be it.

Models are also getting better and better at things we regard as rather difficult intellectual tasks — they are getting better at coding and, slowly, at mathematics. DeepMind’s Alphacode, competing in a coding competition, outdid 54% of other competitors on a competitive programming challenge. Minerva, a language model fine-tuned on mathematical problems, could solve 50.3% of the MATH dataset, consisting mostly of high school competition level problems.

Sure, there are still plenty of intellectual tasks that these models are not able to do, at this exact moment in time. But with such a rapid advancement in the last 10 years, who can be confident what they won’t be able to do in another 10?