Unlock the Editor’s Digest for free
Roula Khalaf, Editor of the FT, selects her favourite stories in this weekly newsletter.
Good morning. French markets fell a bit after President Emmanuel Macron called snap parliamentary elections, but the response was more a resigned Gallic shrug than any sort of panic. More evidence of the Unhedged view that in the short run politics don’t matter much to markets (except in extreme cases). Email me: [email protected].
The robots are here
Everyone who works in an information industry — a category that includes journalists, software coders and stock pickers — should be thinking about whether, or perhaps when, a computer is going to take their job.
A large language model, trained on the writing done for the Financial Times, could write newsletters that sounded a lot like me. Maybe the letters would not be quite convincing today, but it likely won’t be long before they are. Perhaps people don’t want to read newsletters written by LLMs, in which case my trip to the knacker’s yard is not quite booked. But the threat is clear.
Unhedged readers may be less interested in the future of journalism than that of analysts and portfolio managers. Which brings me to a recent paper by three scholars at the University of Chicago school of business, Alex Kim, Maximilian Muhn and Valeri Nikolaev (I’ll call them KMN). The paper, “Financial Statement Analysis with Large Language Models”, puts ChatGPT to work on financial statements. With some fairly light prompting, the LLM turned those statements into earnings predictions that were more accurate than analysts’ — and the predictions formed the basis for model portfolios which, in back tests, generated meaty excess returns.
“We provide evidence consistent with large language models having human-like capabilities in the financial domain,” the authors concluded. “Our findings indicate the potential for LLMs to democratise financial information processing.”
KMN fed ChatGPT with thousands and thousands of balance sheets and income statements, stripped of dates and company names, from a database spanning from 1968 to 2021 and covering more than 15,000 companies. Each balance sheet and accompanying income statement contained the standard two years of data, but was an individual input; the model was not “told” about the longer-term history of the company. KMN then prompted the model to perform quite standard financial analyses (“What has changed in the accounts from last year?”, “Calculate the liquidity ratio”, “What is the gross margin?”).
Next — and this turned out to be crucial — KMN prompted the model to write economic narratives that explained the outputs of the financial analysis. Finally, they asked the model to predict whether each company’s earnings in the next year would be up or down; whether the change would be small, medium-sized, or large; and how sure it was of this prediction.
Predicting the direction of earnings, even in a binary way, turns out not to be particularly easy, for either human or machine. To simplify significantly: the human’s predictions (drawn from the same historical database) were accurate about 57 per cent of the time, when measured half way through the previous year. This is better than ChatGPT did before it was prompted. After prompting, however, the model accuracy moved up to 60 per cent. “This implies that GPT comfortably dominates the performance of a median financial analyst,” in predicting the direction of earnings, KMN wrote.
Finally, KMN built long and short model portfolios based on the companies for which the model foresaw significant changes in earnings with the highest confidence. In back tests, these portfolios outperformed the broad stock market by 37 basis points a month on a capitalisation-weighted basis and 84bp a month on an equal-weighted basis (suggesting the model adds more value with its predictions of small stocks’ earnings). This is a lot of alpha.
I spoke to Alex Kim yesterday, and he was quick to emphasise the preliminary nature of the findings. This is proof of concept, rather than proof that KMN have invented a better stockpicking mouse trap. Kim was equally keen to emphasise KMN’s finding that asking the model to compose a narrative to explain the implications of the financial statements appeared to be the key to unlocking greater forecast accuracy. That is the “human-like” aspect.
The study raises loads of issues, especially for a person like me who has not spent much time thinking about artificial intelligence. In no particular order:
-
The KMN result does not strike me as surprising overall. There has been a lot of evidence over the years that earlier computer models or even just plain old linear regressions could outperform the average analyst. The most obvious explanation for this is that the models or regressions just find or follow rules. They are therefore not prey to the biases that are only encouraged or confirmed by the richer information human beings have access to (corporate reports, executive blather and so on).
-
What is perhaps a bit more surprising is that an out-of-the-box LLM was able to outperform humans pretty significantly with quite basic prompts (the model also outperformed basic statistical regression and performed about as well as specialised “neural net” programs trained specifically to forecast earnings).
-
All the usual qualifications that apply to any study in the social sciences apply here, obviously. Lots of studies are done; few are published. Sometimes results don’t hold up.
-
Some of the best stock pickers specifically eschew Wall Street’s obsession with what earnings are going to do in the near term. Instead, they focus on the structural advantages of businesses, and on the ways the world is changing that will advantage some businesses over others. Can ChatGPT make “big calls” like this as effectively as it can make short-term earnings forecasts?
-
What is the job of a financial analyst? If the LLM can predict earnings better than their human competitors most of the time, what value does the analyst provide? Is she there to explain the details of a business to the portfolio manager who makes the “big calls”? Is she an information conduit connecting the company and the market? Will she still have value when human buy and sell calls are a thing of the past?
-
Perhaps AI’s ability to outperform the median analyst or stock picker will not change anything at all. As Joshua Gans of the University of Toronto pointed out to me, the low value of the median stock picker was demonstrated years ago by the artificial intelligence technology known as the low-fee Vanguard index fund. What matters will be LLMs ability to compete with, or support, the very smartest people in the market, many of whom are already using gobs of computer power to do their jobs.
I am keen to hear from readers on this topic.
One good read
More on Elon Musk’s pay.
Read the full article here