The real quandary of AI isn’t what people think

By News Room On Feb 16, 2024

Unlock the Editor’s Digest for free

Do you think the leading large language model, GPT-4, could suggest a solution to Wordle after having four previous guesses described to it? Could it compose a biography-in-verse of Alan Turing, while also replacing “Turing” with “Church”? (Turing’s PhD supervisor was Alonzo Church, and the Church-Turing thesis is well known. That might befuddle the computer, no?) Shown a partially complete game of tic-tac-toe, could GPT-4 find the obvious best move?

All these questions, and more, are presented as an addictive quiz on the website of Nicholas Carlini, a researcher at Google Deepmind. It’s worth a few minutes of your time as an illustration of the astonishing capabilities and equally surprising incapabilities of GPT-4. For example, despite the fact that GPT-4 cannot count and often stumbles over basic maths, it can integrate the function x sin(x) — something I long ago forgot how to do. It is famously clever at wordplay yet flubs the Wordle challenge.

Most staggering of all, although GPT-4 cannot find the winning move at tic-tac-toe, it can “write a full javascript webpage to play tic-tac-toe against the computer” in which “the computer should play perfectly and so never lose” within seconds.

One comes away from Carlini’s test with three insights. First, not only can GPT-4 solve many problems that would stretch a human expert, it can do so a hundred times more quickly. Second, there are many other tasks at which GPT-4 makes mistakes that would embarrass a 10-year-old. Third, it is very hard to figure out which tasks fall into which category. With experience, one starts to get a feel for the weaknesses and the hidden superpowers of the large language model, but even experienced users will be surprised.

Carlini’s test illustrates a point that has been explored in a more realistic context by a team of researchers working with Boston Consulting Group (BCG). Their study focuses on why the strengths and weaknesses of generative AI are often unexpected. Fittingly, it is titled Navigating the Jagged Technological Frontier.

At BCG, consultants armed with GPT-4 dramatically outperformed those without the tool. They were given a range of realistic tasks such as brainstorming product ideas, performing a market segmentation analysis and writing a press release. Those with GPT-4 did more work, more quickly and of much higher quality. GPT-4, it seems, is a terrific assistant to any management consultant, especially those with less skill or experience.

The researchers also included a task that it seemed the AI should find easy, but which was carefully designed to confound it. This was to make strategy recommendations to a client based on financial data and transcripts of interviews with staff. The trick was that the financial data was likely to be misleading unless viewed in the light of the interviews.

This task wasn’t beyond a capable consultant, but it did fool the AI, which tended to give extremely bad strategic advice. The consultants were, of course, free to ignore the AI’s output, or even to cut the AI out entirely, but they rarely did. This was the one task at which the unaided consultants performed better than those equipped with GPT-4.

This is the “jagged frontier” of generative AI performance. Sometimes the AI is better than you, and sometimes you are better than the AI. Good luck guessing which is which.

This column is the third in a series about generative AI in which I have been scrambling to find technological precedents for the unprecedented. Still, even an imperfect analogy can be instructive. Looking at assistive fly-by-wire systems alerts us to the risk of complacency and deskilling; the sudden rise of the digital spreadsheet shows us how a technology can destroy what seems to be the foundations of an industry, yet end up expanding the number and range of new jobs in that industry.

This week, I’d like to suggest a final precursor: the iPhone. When Steve Jobs launched the genre-defining iPhone in 2007, few people imagined just how ubiquitous smartphones would become. At first they were little more than an expensive toy. The killer app was the ability to make them crackle and buzz like lightsabres. Yet soon enough, we were spending more time with our smartphones than with our loved ones, using them to replace the TV, radio, camera, laptop, satnav, Walkman, credit card — and above all, as an endless source of distraction.

Why suggest the iPhone might teach us something about generative AI? The technologies are different, true. But we might want to reflect on how quickly we became dependent on smartphones and how quickly we started to turn to them out of habit, rather than as a deliberate choice. We want company, but instead of meeting a friend we fire off a tweet. We want something to read, but rather than picking up a book, we doomscroll. Instead of a good movie, TikTok. Email and WhatsApp become a substitute for doing real work.

There will be a time and a place for generative AI, just as there is a time and a place to consult the supercomputer in your pocket. But it may not be easy to figure out when it will help us and when it will get in our way. Unlike with generative AI, anybody with a pen, paper and three minutes to spare can write a list of what they do better with a smartphone in hand, and what they do better when the smartphone is out of sight. The challenge is to remember that list and act accordingly.

The smartphone is a powerful tool that most of us unthinkingly misuse many times a day, despite the fact that it is far less mysterious than a large language model like GPT-4. Will we really do a better job with the AI tools to come?

Tim Harford’s new book for children, ‘The Truth Detective’ (Wren & Rook), is now available

Follow @FTMag to find out about our latest stories first

Read the full article here