What is analysis?

Many people analyse football matches. They do that for many different reasons. If you write a football analysis for entertainment than anything you write goes. But if you write to explain football or even better yet, a pro football club is using your analyses for match preparation or training programs to improve players, then it matters what you write. For one your analysis need to be correct. An incorrect analysis can’t explain how football works. Even worse an incorrect analysis can lead to the unnecessary loss of matches. An incorrect analysis can mean that the training staff give players the wrong suggestions for improvement. Another point is that a description is not an analysis. For that reason it is good to discriminate between match reports like the Guardian has, tactical match reports like Between The Post has and match analyses like Spielverlagerung has.

A good analysis also explains why it is unlikely that the outcome is due to happenstance. For every draw and every match with a goal difference of 1, it is hard to explain why the match ended in this result rather than another.

Here is an example of how to argue beyond happenstance:

“As isolated incidents, these can be forgiven as no team is 100% perfect defensively and all the defensive organisation in the world cannot always negate the attacking side. But for Borussia Dortmund these aren’t isolated incidents. Of their five goals conceded so far this season, three now have come via corners. Last season BVB conceded 19 goals from set-pieces- six from corners and six from free-kicks.”


That means that it pays to understand what an analysis entails. Now a philosophical analysis isn’t the same as a football analysis. Yet, the lessons philosophers learned about analysis equally apply to football. For instance, most people would think that analysis means that you break something down into its components. This is how analysis is used within football. Football analysis breaks a match down into components like formation, half spaces, tactics etcetera. This is what is called decompositional analysis within philosophy. Besides this form of analysis there are two other forms of analysis. So in total there are three forms of analysis:

  1. Decompositional analysis where you break things down into components.
  2. Regressive analysis where you work back to first principles.
  3. Transformative or interpretive analysis where you translate what you analyse first into something else, preferably logic.

Let me give you my regressive and transformative analysis of the football game itself as I published in my boek “Voetbalstatistiek: filosofie voor voetballers”. Football has rules. Philosophers love rules as they provide a syntax. A syntax makes it easy to find the first principles. Here are the first principles of football as an example of regressive analysis:

  • Principle 1: the purpose of football is to win.
  • Principle 2: you win football by scoring more goals than your opponent.
  • Principle 3: the closer you get to the opponents goal the easier the bigger the chance becomes to score a goal.

From these three basic principles you can then derive rules like these:

  • Rule 1: scoring goals is good.
  • Rule 2: if scoring is good, then assisting is good.
  • Rule 3: if assisting is good then increasing the chance for an assist is also good.
  • Rule 4: if assisting is good then decreasing the chance for an assist is bad.
  • Rule 5: if scoring is good, then failing to score is bad.
  • Rule 6: conceding a goal is bad.
  • Rule 7: if conceding a goal is bad, then preventing a goal is good.
  • Rule 8: if conceding a goal is bad, then increasing the chance that the opponent scores is bad.

Now these principles and rules might come across as very obvious statements. But that is exactly the point. You want to build your analysis on such simplicity that nobody would argue against it. Furthermore, through the process of synthesis which is the opposite of analysis, you can combine these basic building blocks into more complex statements. Or with transformative or interpretive analysis you can translate these rules to actually football analysis.

So let’s look at an example:

What is the better strategy for football: having a lot of small chances to score or only a few big chances? In the end the only thing that matters is whether a team scored more goals than the opposite teams. If the team managed to do that by shooting a lot on the goal, then that turns out to be the best strategy for them at that time. Here you can see the reasoning behind: the winning manager is always right. Nevertheless, you can also look at multiple matches and then you see that if you fail to score a lot (rule 5) even though you win now and then, over the long run you are worse of. That is the reason why teams are using crosspasses less and less. Statistics shows that crosspasses lower the chances to score (rule 4). So in general it is better to have a few big chances than more small chances as rule 4 and 5 bite less in that case.

Of course you don’t have to agree with my line of reasoning. That is not the point. The point is to show how analysis is done. This is important because many times an analysis consists of nothing more than describing what has happened on the pitch. A description is not an analysis. Stating “Team A played in a 4-3-3 formation” is not an analysis. It is a description. It would turn into an analysis when one would state: “Team A played in a 4-3-3 formation because they played with 4 defenders, 3 midfielders and 3 strikers”. Then you are breaking the 4-3-3 formation into its components. Yet, most people would not feel that this would be a very good description as it is too obvious. So an even better analysis would be where you would use data, for instance heat maps, to proof that indeed the team played with three lines of players and the line closest to their own goal had four players and the other two lines had three players.

If you use your analysis to identify causes, then it is important to know which criticisms can be leveraged against talking about cause and effect. The safer option is to talk about the probability of a certain pattern in a match or a certain sequence of actions of a player repeating in the future.

Patterns of weak or empty analysis

Thanks to the writing guide for analyses of the University of Michigan, we have a nice list of seven patterns of weak or empty analysis:

  1. Offers a new fact or piece of evidence in place of analysis. This is done a lot in football analyses. They tend to be a long list of more and more data being presented with very little analysis of the previous presented data. As the University of Michigan writes: “Telling the reader what happens next or another new fact is not analysis.”
  2. Analysis is biased. There are a lot of implicit biases. For one statistics is counterintuitive. People draw conclusions from statistics that are wrong. Then there is confirmation bias. Information that supports your point of view reaches your conscious mind much easier than information that disproves your point. Finally we have survivor bias. Although there is a lot of confirmation bias in football. Survivor bias is even more rampant. With survivor bias you only look at the data of the football players who made it, without checking how many failed football players had the same data but did not make it.
  3. Analysis restates claim. This happens a lot in Twitter discussion where someone does not agree with what is being said. Then the original author gives arguments why he is right or why the opposing view is wrong. Finally, rather than engaging the arguments, the critic simply restates his point of view. Also don’t use a tautology in an analysis.
  4. Dismiss the relevance of the evidence. Here is an example from an actual dismissal of the relevance of the evidence: “The Union have quickly become the team in MLS that other teams should emulate. They’ve got the second best expected goal difference (xGD) per game. Though, in fairness,, the gap between them and LAFC in first is the same as the gap between them and the Columbus Crew, who currently sit 17th in MLS. But, in further fairness,their budget isn’t in the same stratosphere as LAFC, to say nothing of the teams they rank above likeAtlanta, Toronto, NYCFC, and the LA Galaxy.” Source.
  5. Strains logic. People often mistake their logical error with a disagreement of opinion. Such is not the case. While there is no absolute truth in our empirical world where there is only probability, there is absolute truth in logic and mathematics. If an analysts for instance prefers to use xG because even if the correlation of xG with future goals is low (say 27%), it is the best correlation we have, he violates the laws of mathematics that show how little information there is in a 27% correlation. That is not a difference of opinion, but a logical error. The same goes with faulty statistics. Although our empirical world is ruled by probability, probability itself is ruled by the logic of uncertainty. In the same way as mathematics is ruled by the logic of certainty.
  6. Generalization to arrive at the argument. There are many unfounded generalizations in football that are used to arrive at a certain conclusion. That is why a good analysis goes into great detail of the relevant data without adding more data to it as that would weaken the analysis as per our first point. So an analysis of team data without delving into the underlying player data for instance uses generalization to arrive at a conclusion. For example: In a player report I read, the consultancy firm was praising a central defender for being in the top 5% of the best defenders to support the attack because his goals per minute was very high. Then I looked in the underlying data. What was the case? This defender playing in the Premier League scored twice in the 16/17 season playing little more than 3000 minutes. Then in the 17/18 season he scored three times! And in just 2000 minutes played! That was a big boost for his goals per minute, but of course the underlying data demonstrated (a) that the difference between scoring two times or three times is pretty much a matter of happenstance and (b) that he played only 2000 minutes instead of 3000 minutes indicated that the manager did not appreciate him the way the consultancy firm concluded. This report was used to try and get this defender to play at a bigger club for a better salary and hefty transfer fee. But the generalizations were only used to reach a conclusion rather than support an analysis. The transfer did not happen and in the 18/19 season this defender only played 1000 minutes and did not score at all.
  7. Offers advice or a solution without first providing analysis. It is not to criticize StatsBomb as they do many wonderful things, it just happened that their analysis was the first that I saw in my Twitter feed. Nevertheless, there opening sentence of the piece quoted earlier is an excellent example of offering advice with first providing the analysis: “The Union have quickly become the team in MLS that other teams should emulate.”

Associative learning

Associative learning is one of the three ways the brain learns. The other two ways are: imprinting and instrumental learning. Associative learning has been discovered and made famous by Pavlov. For that reason it is also called Pavlovian learning. With associative learning the brain creates a probabilistic relationship between two sense impressions.

Associative learning is very important in football as it is the underlying learning principle of game intelligence. Players with a lot of game intelligence have learned many associations between certain position of their teammates and opposing players. A lot of training a team involves building up the right set of associations. Associations steer our behavior unconsciously. That’s why most players play at their best when they are in the flow. When they don’t think about it and just act. That’s why you see players make the wrong decisions when they suddenly have more time to act than usually. Instead of relying on the associations build in the unconsciousness, they overthink the situation and make mistakes.

Youth development works best when they are trained to build as many correct associations as possible. Associative learning explains why it is so important to have the right trainers and develop in the right team. Because they brain learns as easily negative associations that hinder the achievement of your goals as they learn positive associations. So for youth development it is good to think really hard about which trainer and which team will build the best associations and hence improve the game intelligence of the player the most. One reason why the football academy of Ajax is producing so many great football players has to do with that their programme teaches the brain to build up associations that proof to be very valuable in their career.  

Of course, the scouts of Ajax also has a great eye for talent.  Yet, associative learning also explains why some scouts are better than other scouts. And why different scouts see different things when they watch the same player. The reason why top scouts are also very valuable is that their network of associations has been proven to have a high correlation with future success.

Associative learning is also the reason why it is best to have multiple scouts watch a player and why besides watching a video of a player it is also important to go and a live match. Watching videos triggers different associations than watching a player live. Top video scouts and top live scouts both have a network of valuable association. But because these are different networks and different associations both inputs are especially valuable as they are also independent. A player that is liked by both the video scout and the live scout has a bigger probability of future success as a player where the live scout and the video scout differ in their opinion.

Drills with and without context

Due to associative learning drills with context are much better than drills without context. Football is a thinking sport. That means that pattern recognition by the unconscious mind is crucial for success. Especially as this builds game intelligence. Game intelligence and pattern recognition are created mostly through associative learning. So if you do a drill without context, for instance indiscriminate dribbles or passing for passing’s sake, the brain will build dribbling associations without the correct patterns to recognize. So that means that the player will learn the dribble technique, but his brain won’t know when to dribble and when not to dribble because he lacks the game intelligence to recognize the correct pattern.

When you do drills with context, that enables the brain to learn to right patterns. Learning how to pass so to exploit space at the same time, not only teaches the passing technique, but also teaches how and when to pass according to patterns as they develop on the pitch. So drilling with context builds the right associations, while drilling without context teaches incorrect associations. To be clear: it is not that without context the brain doesn’t learn new associations. All sense impressions lead to new or updated associations. That is the reason why drilling without context actually teaches players the wrong associations and thus decreases their game intelligence.

Changing associations

Fortunately, associations can easily be changed. All sense impressions build up associations. So the more new experiences you have, the faster your associations change. Repetition is key here. Creating the right associations is one of the reasons, besides fitness and technique, why players train as much as they do. A team with a lot of new players often performs less well than a team that has played together for a long time. Again, the reason is that the new players haven’t yet build up the network of associations that are required for the strategy and tactics that the manager wants to deploy. A no look pass only works when the right associations are in the brains of both players involved in that pass. 

Pavlov has made an interesting observation that besides new experiences hypnosis might also be a great way of changing negative associations. More and more sports teams are working with hypnotists to improve their play. Hypnosis sound really out of the ordinary. But that is because most people think of hypnosis in a very narrow sense. The famous hypnotist Derren Brown has defined hypnosis as the ability to get people to play along with your story. Influential talk always consists of hypnotic language patterns. So actually talking to players in the right way, as great coaches and managers do, also change the associations players have.

Unfortunately, associations also explain why a manager sometimes can’t seem to reach his players anymore. Associations build an expectation. And what we experience is much more influenced by what we expect to experience than by the data that actually reaches our brain through our senses. Over the years if players have the same manager each and every year, their associated expectation of that manager becomes stronger and stronger by the day. That is perfectly fine as long as everything works out great, because then the players only need to hear half a word the manager says to understand what he wants them to do on the pitch. But as soon as the old way of playing starts to fail and the manager decides to change his tack, then suddenly the old association block the players from understanding, and even literally hearing, what the manager has to say. Bringing in a new manager can then in fact work, for the reason that the players don’t have an associated expectation of the new manager. It then becomes easier for their brain to actually listen to what the new manager has to say, instead of filling in his words based on previous associations.


Averages are used a lot in football. Yet, an average lacks a lot of context. A player who scores an average of one goal per match, could have gotten that average by scoring ten goals in the first match (against an easy opponent) and then not scored at all for the next nine matches. So a weighted average might already be better than a simple average, although some people feel that you then introduce more subjectivity. This is not the case. An average is as subjective as a weighted average. The only difference is that often there is less convergence of opinion on a weighted average than on a simple average. By introducing weights you introduce more elements where people can disagree.

Most often average are take over many data points. That also makes an average insensitive. That means that if a player suddenly starts to do better or worse, it takes a lot of time before you see that change in the averages. Especially when preparing for the upcoming match, it is often much better to look at measures that are more sensitive to change than an average.

For player recruitment scouts are looking for players who are above average. But here there are pitfalls too. If your club is above average, it doesn’t tell you much if you learn that player A is above average for the league. That player is probably better than a player who is below the league average, but he still might weaken the team given that the team itself is also above average. 

In fact even if your club is below the league average and you can hire a player whose is above the league average, then you still don’t know whether he is going to strengthen the team. For you have to also look at the player he is going to replace in the team. If that player is one of your star players and even better than the above average player, hiring this above average player is still going to weaken your team. Of course, if the only other option is a below league average player, then hiring the above league average player is the least bad option. But simply stating that a player is above the league average is not enough to conclude that he will strengthen the team.

The same goes for stating that a player is in the top 5% percentile, or even the top 1% percentile. If data shows that a player is above the league average, that only means that he is in the top 50% percentile. So placing the player in the top 5% percentile already gives you much more information. Nevertheless, if the player he needs to replace or the team itself is in the top 1% percentile, then even a top 5% player can weaken the team.

The average of multiple variables

In reality most clubs don’t work with a single variable to determine whether a player is above or below the league average. Although it can be done. You can summarize many data points or averages in an average of averages. But in most cases clubs are looking at a lot of different variables. Players can be above average for a couple of those and below average for other variables.

With multiple variables it becomes even harder to use averages to see whether a player is going to strengthen or weaken the team. Looking at playing style helps a bit, but it remains uncertain whether a player can replicate his stats in a different team. Here is where the eye of human scout works wonders. In one case we were looking for a winger and a striker for an average club in the Dutch Eredivisie. I had found a striker that had nice stats in the FBM statistics I have developed. The very seasoned scout I work with told me he was no good as a striker, but that he was an interesting option for that club as a winger.

When the head scout at the club saw that we proposed this striker as an option, he also expressed a dislike for this player. Yet, when I explained that we weren’t proposing him as the center forward, but as a winger his face lit up. “Yes” he said, “I can see him excel as a winger indeed.” That is one of the many reasons why you always combine data scouting with video and live scouting. The human brain is still a wonderful biocomputer to find solutions where digital computers have a hard time coming up with the right solution.

When looking at multiple variables, it is important to be very skeptical of reports telling you that player X is above average or in the top 5% percentile in regard to skill Y. For instance In a player report I read, the consultancy firm was praising a central defender for being in the top 5% of the best defenders to support the attack because his goals per minute was very high. Then I looked in the underlying data. What was the case? This defender playing in the Premier League scored twice in the 16/17 season playing little more than 3000 minutes. Then in the 17/18 season he scored three times! And in just 2000 minutes played! That was a big boost for his goals per minute, but of course the underlying data demonstrated (a) that the difference between scoring two times or three times is pretty much a matter of happenstance and (b) that he played only 2000 minutes instead of 3000 minutes indicated that the manager did not appreciate him the way the consultancy firm concluded. This report was used to try and get this defender to play at a bigger club for a better salary and hefty transfer fee. The transfer did not happen and in the 18/19 season this defender only played 1000 minutes and did not score at all.

Such a presentation are not only misleading, but even if the underlying data is solid, then it is still risky. Our mind tends to focus and remember outstanding stats and overlook and forget all other stats. This is part of how confirmation bias works. Our unconscious mind then only processes the highlights of a player. Through associative learning our brain then connects good feelings to this player. Feelings that our conscious mind interprets as a good intuition. For that reason it is important to really delve deep into the underlying data of an average or risk making mistakes. Fortunately, in my experience that people working for clubs really do delve deep and often get very annoyed and distrustful (which is a good thing) when data providers can’t explain how they arrived at a certain value of a variable or an average.