Unlock the Editor’s Digest for free
Roula Khalaf, Editor of the FT, selects her favourite stories in this weekly newsletter.
On June 1 2009, Air France Flight 447 vanished on a routine transatlantic flight. The circumstances were mysterious until the black box flight recorder was recovered nearly two years later, and the awful truth became apparent: three highly trained pilots had crashed a fully functional aircraft into the ocean, killing all 288 people on board, because they had become confused by what their Airbus 330’s automated systems had been telling them.
I’ve recently found myself returning to the final moments of Flight 447, vividly described by articles in Popular Mechanics and Vanity Fair. I cannot shake the feeling that the accident has something important to teach us about both the risks and the enormous rewards of artificial intelligence.
The latest generative AI can produce poetry and art, while decision-making AI systems have the power to find useful patterns in a confusing mess of data. These new technologies have no obvious precursors, but they do have parallels. Not for nothing is Microsoft’s suite of AI tools now branded “Copilot”. “Autopilot” might be more accurate, but either way, it is an analogy worth examining.
Back to Flight 447. The A330 is renowned for being smooth and easy to fly, thanks to a sophisticated flight automation system called assistive fly-by-wire. Traditionally the pilot has direct control of the aircraft’s flaps, but an assistive fly-by-wire system translates the pilot’s jerky movements into smooth instructions. This makes it hard to crash an A330, and the plane had a superb safety record before the Air France tragedy. But, paradoxically, there is a risk to building a plane that protects pilots so assiduously from error. It means that when a challenge does occur, the pilots will have very little experience to draw on as they try to meet that challenge.
In the case of Flight 447, the challenge was a storm that blocked the airspeed instruments with ice. The system correctly concluded it was flying on unreliable data and, as programmed, handed full control to the pilot. Alas, the young pilot was not used to flying in thin, turbulent air without the computer’s supervision and began to make mistakes. As the plane wobbled alarmingly, he climbed out of instinct and stalled the plane — something that would have been impossible if the assistive fly-by-wire had been operating normally. The other pilots became so confused and distrustful of the plane’s instruments, that they were unable to diagnose the easily remedied problem until it was too late.
This problem is sometimes termed “the paradox of automation”. An automated system can assist humans or even replace human judgment. But this means that humans may forget their skills or simply stop paying attention. When the computer needs human intervention, the humans may no longer be up to the job. Better automated systems mean these cases become rare and stranger, and humans even less likely to cope with them.
There is plenty of anecdotal evidence of this happening with the latest AI systems. Consider the hapless lawyers who turned to ChatGPT for help in formulating a case, only to find that it had fabricated citations. They were fined $5,000 and ordered to write letters to several judges to explain.
The point is not that ChatGPT is useless, any more than assistive fly-by-wire is useless. They are both technological miracles. But they have limits, and if their human users do not understand those limits, disaster may ensue.
Evidence of this risk comes from Fabrizio Dell’Acqua of Harvard Business School, who recently ran an experiment in which recruiters were assisted by algorithms, some excellent and some less so, in their efforts to decide which applicants to invite to interview. (This is not generative AI, but it is a major real-world application of AI.)
Dell’Acqua discovered, counter-intuitively, that mediocre algorithms that were about 75 per cent accurate delivered better results than good ones that had an accuracy of about 85 per cent. The simple reason is that when recruiters were offered guidance from an algorithm that was known to be patchy, they stayed focused and added their own judgment and expertise. When recruiters were offered guidance from an algorithm they knew to be excellent, they sat back and let the computer make the decisions.
Maybe they saved so much time that the mistakes were worth it. But there certainly were mistakes. A low-grade algorithm and a switched-on human make better decisions together than a top-notch algorithm with a zoned-out human. And when the algorithm is top-notch, a zoned-out human turns out to be what you get.
I heard about Dell’Acqua’s research from Ethan Mollick, author of the forthcoming Co-Intelligence. But when I mentioned to Mollick the idea that the autopilot was an instructive analogy to generative AI, he warned me against looking for parallels that were “narrow and somewhat comforting”. That’s fair. There is no single technological precedent that does justice to the rapid advancement and the bewildering scope of generative AI systems. But rather than dismiss all such precedents, it’s worth looking for different analogies that illuminate different parts of what might lie ahead. I have two more in mind for future exploration.
And there is one lesson from the autopilot I am convinced applies to generative AI: rather than thinking of the machine as a replacement for the human, the most interesting questions focus on the sometimes-fraught collaboration between the two. Even the best autopilot sometimes needs human judgment. Will we be ready?
The new generative AI systems are often bewildering. But we have the luxury of time to experiment with them; more than poor Pierre-Cédric Bonin, the young pilot who flew a perfectly operational aircraft into the Atlantic Ocean. His final words: “But what’s happening?”
Tim Harford’s new book for children, ‘The Truth Detective’ (Wren & Rook), is now available
Follow @FTMag to find out about our latest stories first
Read the full article here