Unlock the Editor’s Digest for free
Roula Khalaf, Editor of the FT, selects her favourite stories in this weekly newsletter.
The writer is a science commentator
Move along, not much to see here. That seemed to be the message from OpenAI last week, about an experiment to see whether its advanced AI chatbot GPT-4 could help science-savvy individuals make and release a biological weapon.
The chatbot “provided at most a mild uplift” to those efforts, OpenAI announced, though it added that more work on the subject was urgently needed. Headlines reprised the comforting conclusion that the large language model was not a terrorist’s cookbook.
Dig deeper into the research, however, and things look a little less reassuring. At almost every stage of the imagined process, from sourcing a biological agent to scaling it up and releasing it, participants armed with GPT-4 were able to inch closer to their villainous goal than rivals using the internet alone.
The chief takeaway from the exercise should not be a sense of relief. “[Open AI] ought to be pretty worried by these results,” wrote Gary Marcus, a commentator who has testified on AI oversight before a US Senate committee, last week in his widely read newsletter.
So should we. We need better independent mechanisms for realistically assessing and curbing such threats. Just as we do not permit drug companies to rule on the safety of medicines, AI risk evaluation cannot be left to the industry alone.
The OpenAI researchers asked 100 vetted volunteers — 50 students with rudimentary biology training, plus 50 experts with wet-lab experience and PhDs in relevant subjects like virology — to plan a biological terror attack, such as an Ebola pandemic. Twenty-five out of each 50 were randomly assigned to use the internet to research their plan; the other 25 could use both the internet and GPT-4.
The challenge itself was split into five tasks: settling on a biological agent and planning a strategy; getting hold of the agent; replicating enough to make a weapon; formulating and stabilising it; and, finally, release. External biosecurity specialists then scored participants, out of 10, on how well they planned those tasks across five measures, including accuracy, completeness and innovation. High scores could be earned, for example, by identifying the correct reagents; listing the right steps in the production process; and finding a novel way of skirting security precautions.
Both students and experts with access to GPT-4 were judged more accurate than the internet-only groups. But the killer combination was scientific expertise plus GPT-4. The AI-enabled experts, who were allowed to use an unrestricted version of GPT-4, scored an extra 0.88 out of 10 compared with internet-only experts. The researchers set the threshold for concern at 8 out of 10; several experts using GPT-4 succeeded in reaching this milestone, particularly in procurement, scaling up, formulation and release.
Yet the findings were deemed statistically insignificant, with the researchers conceding only that the unrestricted GPT-4 “may increase experts’ ability to access information about biological threats”. At face value, the tables offer a different take: GPT-4 quadrupled the chances of an expert coming up with a viable formulation.
The authors do acknowledge other study limitations. While participants worked alone in five-hour sessions, terrorists can hunker down together for weeks or months. And participants were not able to access the full range of GPT-4 advanced data analysis tools, which could, the researchers admit, “non-trivially improve the usefulness of our models” for plotting attacks.
In defence of its equivocal message, OpenAI can point to a recent Rand Corporation study, which also found that, as of summer 2023, large language models did not make bioterror attack plans any more statistically viable compared with just using the internet.
Rand researchers did, however, recognise a fluid situation changing on an unknown timescale. “We shouldn’t be overplaying the risks but we shouldn’t be minimising them either,” says Filippa Lentzos, a researcher in science and international security at King’s College London, who urges governments and academics to get involved in evaluating such threats.
AI consistently surprises and GPT-4 is not the only game in town. Rogue states, psychotic loners and malicious groups will find loopholes. While this experiment focused only on planning, the growth of remote “cloud labs”, where experiments can be farmed out to automated facilities, may change the calculus on execution. AI-designed toxins are an added risk.
Across the piece, there is plenty to see here — enough to merit a sizeable uplift in thinking about AI’s capacity to abet bioterrorism.
Read the full article here