LLMs and the Myths of the Supercomputer Sports Predictor


Euro 2024 is just around the corner, and that means you are absolutely guaranteed to see a splash in one of the tabloids explaining how a supercomputer has predicted the outcome of the tournament. In fact, those articles have already arrived. Of course, most of us see these reports as a bit of fluff, not to be taken too seriously. Indeed, most of us would put such reports on a par with the predictions of Paul the Octopus, the cephalopod ‘star’ of the 2014 World Cup.

That said, sports prediction models are big business, and that trend is set to grow now that we are embarking on the AI boom. Companies like IBM have been using machine learning to make sports predictions long before any of us had heard of OpenAI and ChatGPT, yet the current crop of large language models (LLMs) makes things a little more interesting. AI models will continue to improve on sports predictions, but there is arguably a limit to what they can do. Some of it is through the nature of sport, which can never be fully predictable. However, other limits are imposed by the structures of LLMs and how they work.

LLMs are next-word predictors

To explain, we need to look at what an LLM does. An LLM is the engine room for the AI application you interact with, such as ChatGPT – that definition is a bit simplistic, but it will serve its purpose. Effectively, the LLM works as a next-word(or image or sound) predictor. It does not think its answers in the same manner as a human might. It simply predicts what you want to hear: You tell the LLM “dog”, and it parses out that the next word you wish to hear is “cat” or “food” or “bone” or “Lassie”. The real wizardry is that it is able to put that together into natural language.

For sports, then, the AI will be able to detect patterns and trends from data sets. We might look at the form of Harry Kane before the Euros, and we might also consult statistics, but the AI can do this in millions of different ways, some of which might not be apparent to us mere humans. Of course, some of this is not unique to AI, and some would argue that basic algorithms can do the same. But there are certain advantages to the way LLMs work, as they are less rigid than algorithms.

AI can model with new information

To give you an example of where AI could be useful, consider the 2024 Belmont Stakes, which will take place on the 8th of June. The race, which is one of the most important in the US calendar, is always highly scrutinised, and you can be sure that bookmakers and punters will pour over the data to make their determinations. Yet, there is something unprecedented in 2024 that will impact the Belmont Stakes betting odds – it won’t take place at Belmont Park. It is being run in Saratoga – a shorter course, different track surface, different weather conditions, etc. That means many of the basic models for predicting the race go out the window. AI would be much more adept than humans or algorithms in modelling the outcome.

Yet, as we have said, there are limits, and some of that comes from the difference between two types of data, structured and unstructured. The former refers to the crunchable numbers that you can feed into a machine. Technically, AI is fed on vast amounts of structured data. In sports, we are talking about stuff like form, goals scored, xG, performance in the rain, and so on. For unstructured data, there is more of a case of the unquantifiable elements that do not show up in the stats. For AI, that is much more difficult, sometimes impossible, to model. For humans, it comes naturally.

Human intuition can deal with unstructured data

In other fields, the question of structured versus unstructured data can be broadened, but for our purposes here, we can loosely describe the latter as opinion, even intuition. Have you ever seen a player whose statistics don’t match up with what you can see? Or we could ask the question, how do you model the mindset of a player going through a divorce? You cannot model, for example, what would happen should the England fans get on Gareth Southgate’s back early at Euro 2024 if the manager doesn’t deliver the kind of football the fans crave. As humans, we can react to that better than an AI can.

And that, perhaps, is the overall point. AI supercomputers can model the perfect outcome based on historical, structured data. It can even use unstructured data to get an insight into what experts believe will happen. But the outcome is a perfect prediction – what will happen without any variables thrown into the mix. Sport is all about those variables – a red card, an injury, a tabloid press putting too much pressure on young players. AIs might make the perfect case for who should win each big sporting event, but that’s very different from who will win.