Stay informed with free updates
Simply sign up to the Artificial intelligence myFT Digest — delivered directly to your inbox.
ChatGPT users in the UK have been flummoxed by the chatbot responding in Welsh to English-language queries, in the latest example of how the race to create artificial intelligence systems can throw up unexpected bugs.
When English users speak to the bot using ChatGPT’s new voice interface, some have been surprised to find it translates their question to almost perfect Welsh and then responds in the Celtic language.
Several users have reported encountering the glitch with ChatGPT — which was also experienced first-hand by the Financial Times this week — even though they do not understand Welsh, or live in or near Wales.
It is not the first time that users of OpenAI’s breakthrough chatbot have run into linguistic issues. In February, users also complained of a bug where the bot would answer text questions in a combination of Spanish and English.
The Welsh language bug puts a new twist on the common problem of large language models “hallucinating” — making up fictional or nonsensical answers — which has continued to afflict generative AI systems after years of development and billions of dollars of investment.
Microsoft-backed OpenAI, which was valued at $86bn earlier this year in an employee share sale, is locked in a contest with Google and Meta, as well as start-ups including Anthropic, Elon Musk’s xAI and Cohere, to advance its AI capabilities.
ChatGPT now supports dozens of languages, including Icelandic, Georgian and Macedonian. The Welsh government announced a data partnership with OpenAI in June to improve how AI technologies work in the language.
OpenAI has admitted in a research paper that ChatGPT offers “much worse than expected performance” in Welsh, after discovering that most of its training data for the translation was “actually English audio” that had been misidentified by the system.
Sarah Coward, a Cambridgeshire-based entrepreneur, said she came across the Welsh bug when trying out the new voice feature in ChatGPT-4o, which was introduced earlier this year.
“I had no idea what language it was because it completely took me by surprise,” she said. When Coward asked the chatbot why it started speaking in Welsh, ChatGPT replied that it thought Coward would be “more comfortable in that language”.
OpenAI said the problem is a limitation with ChatGPT’s voice transcription system, Whisper.
The company told the Financial Times that sometimes the model gets confused and will transcribe audio in a different language — in this case, Welsh. It recommended that users experiencing the Welsh glitch set their “Speech” setting to English rather than “Auto-detect,” though it could not guarantee that this would fix the problem.
“Everybody knows that ChatGPT and some large language model applications create hallucinations or inaccuracies in responses,” said Coward, whose company In The Room offers conversational AI experiences to brands.
“This is a demonstration of, to a certain extent, legitimate concern that companies should have in employing these types of technologies right now in any consumer-facing area,” she added. “It could be quite damaging in terms of customer experience and . . . trust.”
In an OpenAI paper about its own speech recognition system, the company said: “Welsh is an outlier with much worse than expected performance . . . despite supposedly having 9,000 hours of translation data.”
On inspection, the company found that “the majority of supposedly Welsh translation data is actually English audio” which was “misclassified as Welsh by the language identification [AI] system”.
When the Financial Times encountered the glitch, ChatGPT interpreted a query about cities in the UK, US and Asia as being about Wales.
“It seems I misunderstood the language of your question and responded in Welsh by mistake,” ChatGPT said when challenged about its error. “I’ll be more careful in the future.”
Read the full article here