Hallucinating machines

By News Room On Oct 2, 2023

Receive free Fintech updates

You may have noticed, but generative AI is a hot topic at the moment. And the investment industry — which has never seen a wildly-hyped fad it didn’t like — is on it.

The reality is that quantitative money managers and high-frequency traders have been using “AI” for quite some time now. Machine learning, for example, is useful for predicting trading patterns, while using natural language processing to analyse corporate and central bank waffle is table stakes.

But Man Group’s AHL quant unit has now published an interesting overview of the generative AI landscape that explores how it might affect the asset management industry. Unsurprisingly, the potential is the greatest in language analysis:

Within asset management, early evidence suggests that LLMs can conduct better sentiment analysis compared to more traditional BERT- and dictionary-based methods. Some preliminary findings indicate that ChatGPT exhibits improved financial language understanding compared to models like BERT, for tasks such as interpreting Fedspeak. BloombergGPT, a GPT-based LLM trained partially on Bloomberg’s extensive financial data, also shows how a finance-specific training dataset can further improve financial sentiment analysis abilities. Another way of using LLMs is to extract their word embeddings of input text (i.e., the numerical vector of how the model represents the text) and use these as features in an econometric model to predict sentiment (or even expected returns).

. . . Other language-based applications in asset management include producing summaries and extracting information from large text documents, as well as thematic investing. A recent study has shown that ChatGPT is able to effectively summarise the contents of corporate disclosures, and even exhibit the ability to produce targeted summaries that are specific to a particular topic. Another usecase for LLMs is their ability to identify links between conceptual themes and company descriptions to create thematic baskets of stocks.

Intriguingly, the author — AHL quant researcher Martin Luk — also suggests that it could be handy in generating synthetic, hypothetical financial data. This may seem a bit weird, but it’s potentially very useful.

One of the biggest problems in quantland is that you only have one dataset to work with — ie what has actually happened in markets. While in the actual legit sciences, you can run multiple experiments to generate a host of data.

There are some obvious pitfalls in using simulated financial data (for example, changing some economic parameters or simply inserting some randomness) but Alphaville knows that some more sophisticated quant hedge funds are now actively training some of their models it. Here’s Luk:

Outside of LLMs, GANs (Generative Adversarial Networks) are another kind of Generative AI model that can create synthetic financial timeseries data. Studies have shown that GANs are capable of producing price data that exhibit certain stylised facts that mirror empirical data (such as fat-tailed distributions and volatility clustering), with state-of-the-art applications including using GANs in estimating tail risk (by generating realistic synthetic tail scenarios). GANs have also been used in portfolio construction and strategy hyperparameter tuning, as well as in creating synthetic order book data.

However, the most interesting point is around the tendency of generative AI to “hallucinate” — when large-language models basically just make up plausible-sounding but incorrect stuff (like inventing legal cases).

At its heart, LLMs hallucinate because they are simply trained to predict a “statistically plausible” continuation of the input (hence why their outputs superficially sound quite convincing). But what is most statistically plausible at a linguistic level is not necessarily factually correct, especially if it involves computation or logical reasoning of some sort. Arguably, despite the demonstrated capabilities of LLMs, it is difficult to justify that they “understand what it means” when they give a response to a question: ChatGPT (using GPT3.5) notoriously struggles when asked to write five sentences that end in a particular letter or word. Indeed, supposing that there is only one “correct” continuation of a statement, even if the correct continuation is the most likely, it is common for LLMs to randomly sample over the distribution of continuations and hence have a chance of picking an “incorrect” continuation. Moreover, there is no guarantee that the most likely continuation is the factually correct one, especially if there are multiple ways to complete it correctly.

Here the paper includes a maths problem that ChatGPT flubbed, which should make any quant wary of using it.

The hallucination problem is compounded by how most LLMs extrapolate from one word at a time, and is largely trained on the internet hivemind, which sometimes has a weak grasp of reality.

Since it “commits” to each token generated, it can often start generating a sentence that it does not know how to complete. For instance, it may start a statement with “Pluto is the” followed by the word “smallest” as a plausible continuation (since Pluto used to be the smallest planet). However, once the phrase “Pluto is the smallest” is generated, it is difficult to complete this correctly, and GPT-4 completes it incorrectly as “Pluto is the smallest dwarf planet in our solar system” (where in fact, Pluto is the second largest dwarf planet in our solar system after Eris)

The notion of Pluto being the smallest planet also raises the distinction between faithfulness and factuality. Faithfulness is simply staying consistent and truthful to the provided source, whereas factuality is considered “factual correctness” or “world knowledge”. The LLM may well have come across sources during its training that incorrectly stated that Pluto was the smallest planet, either because they were outdated, or because they came from inaccurate/fictional sources. Given the training materials of LLMs are generally large portions of the Internet, the quality of LLM responses especially on very specialised topics can be incorrect because the type of text where the “correct” knowledge is stored (e.g., textbooks, academic papers) may not necessarily be present in publicly available Internet corpora.

Somehow, we don’t think these LLMs will land a job at Jane Street.

Further reading
— Generative AI will be great for generative AI consultants
— AI is the new ESG

Read the full article here