Probability

Probability is the strength of your believes. Few people understand probability and understand what this entails. Our scientific understanding of probability has been developed by professor De Finetti. He came up with (independently but around the same time as Frank Ramsey) the idea that our probability estimations reflect how strongly or weakly we believe that something will happen.

What is the chance that Barcelona will win be the next Champion League winner? The answer is a number between 0% and 100%. There is a discussion on whether 0% and 100% are still probabilities or that 0% means that it is impossible and that 100% would be absolutely certain. I follow professor De Finetti: 0% means that is extremely unlikely, in fact so unlikely that we have no smaller number for it, but that it is still not impossible. And 100% means that it is extremely likely, in fact so likely that we don’t have a bigger number for it, but still no absolute certainty. For mathematical reasons, in almost all cases, it is best to think of probability as a number between 1% and 99%. That way you never get into the above discussion.

For most people this is a completely new and revolutionary idea to look at probabilities as it makes all probabilities subjective. That is indeed the case, because the strength of your beliefs will often differ from the strength of my beliefs. Less so if we are dealing with physics, for instance gravity. But more so when it comes to football. So for football professionals it is extremely important to get a grip on what probability means and entails.

Most people think that chance is not about the strength of your beliefs, but about the frequency with something happens. This is called Frequentism, whereas De Finetti uses Bayesian statistics. To put it bluntly: Frequentism is wrong. For many reasons, but as a philosopher the biggest problem is the Frequentist definition of chance, which is: 

Chance is the frequency of something happening that has the same chance of happening.

As you can see, Frequentists use “chance” in the definition of chance. This makes this definition circular and meaningless. If you think of chance in terms of frequency, you misunderstand probability.

The correct way is to understand that chance, or probability, is about the strength of your believe. Often you use knowledge of frequencies to strengthen or weaken your belief. But there is a big gap between a frequency and the strength of a belief. 

Why this is so important for football

The data revolution is happening in football. While still in its infancy, many clubs nowadays decide on player acquisition based on data analysis, among many other sources of information like scouting. There are many different ways of doing data analysis in football, but they all come down to that there are X different numbers that people look at. For instance as can be seen here:

As you can see these are frequencies tied to a conclusion that it is a man of the match performance. That conclusion would be the belief. What is missing is an exact number of how strong this belief is. And what is missing is how one gets from these frequencies to the most rational strength of one’s belief. Professor De Finetti has mathematically proven that the best way to turn data into a rational belief, is Bayesian statistics.

The issue becomes even more pressing when you ask the next questions that clubs will ask, namely: is this player going to strengthen our team? This is the same question Mark Cuban ask for players of his Dallas Mavericks:

Football is more complex than basketball, so Points Per Possession (PPP) is not the right aim for football. The crucial part of the quote is: “The probability that a player is able to contribute”. That is what I mean with the question: what is the probability that a player strengthens the team.

Of course the answer to that question depends on the team. For many teams Pablo Fornals is highly likely to be a valuable player. But most of these teams can’t afford to have him play for the team. Then there is a smaller group of teams where he would not be good enough, even though they could easily afford him. 

But say, for argument sake, that we run a club that could afford Pablo Fornals, then it turns out that it is very hard to come up with a probability estimation of how likely it is that Pablo would strengthen the team if you want to base yourself on traditional football data like we see here above. Especially, when you want to put it in a hard number between 1% and 99%. The situation then becomes even more problematic when you also have to consider that there are maybe five other players that the club is also considering and that we want to order these six players so we know which player has the highest chance to be a success and which player has the lowest chance.

There are so many uncertainties when it comes to this one question: is a player going to strengthen the team. The data the club uses might be wrong, incomplete or even tainted with false positives. There are many people in the club who have a strong opinion (i.e. a belief) on these players. Often, one has to work in a situation where there are still many known unknowns and there are always unknown unknowns. In situations like these it has been mathematically proven (with the Dutch Book argument) that a Bayesian network can calculate what the most rational probability assignment is.

I have developed a prototype of a Bayesian network where every scout of the club, every data analyst and every other staff member can enter how likely he thinks that a player is going to strengthen the team. Based on historical figures (for instance the frequency of success each scout, analyst or staff member has) one can give different weights to each part of the network. Once all the strengths of the beliefs of all relevant people are entered in the network, the network itself will automatically calculate the overall probability and rank all the players accordingly. It has been proven that following probabilities calculated this way, is the most rational way to make decisions.

Primary versus secondary statistics

Once you agree with me that within football it is all about the probability that a player is able to contribute to the team, then you can distinguish between the primary statistics, which is the probability that a player is able to contribute to the team, and secondary statistics, which are all other statistics that you use to come to the judgement of how probable it is that a player is going to contribute to the team.

For instance, if you want to know whether how probable it is that a certain striker is going to be able to contribute to the team, you might look at secondary statistics in the previous season like goals scored, expected goals (xG) or shots on target. These are just examples. The point is that you then use these secondary statistics to derive the primary statistic.

This derivation can either be formal or informal. With a formal derivation you actually have a statistical model where the secondary statistics are the input and where the primary statistics. This output is in the form of a value between 1% and 99%.

The same happens when a manager makes a decision only based on the secondary statistics. In that case he uses his brain to create an informal derivation of the primary statistic from the input of the secondary statistics. His brain is a Bayesian brain, a Bayesian biocomputer, able to calculate these probabilities. The reason why a club is eager to hire a great manager is in part due to the fact that the decision makers at the club, unconsciously, expect the brain of the manager to be able to calculate the primary statistic based on all the secondary input they are going to give him. Of course, the manager’s brain is much more occupied with short term gains rather than long term well being of the club. For that reason it is often much better to have a separate head of recruitment who also has a great brain that is able to compute the primary statistic, but whose brain has more of a focus on the long term.

In any case, whether you formally calculate the primary statistic or whether you have enough trust in the brains of the decision makers at the club, in the end clubs work with the primary statistic: the probability that a player is able to contribute to the team.

For more on probability, see the article about what is possible.