Sezer Unar
10 min readMay 18, 2022

“Football is a game of mistakes. Whoever makes the fewest mistakes wins.”

I decided to take action on a subject that I have been thinking about for a long time.

Mistakes…

The number of goals in football is much less than in other sports. Moreover, this game is the game of the moments. Everything can be turned upside down in a moment, you can become the champion with a goal. So even a seemingly insignificant mistake can directly affect the result.

This was the reason why I started with Cruyff’s famous quote.

Mistakes may have a different meaning for each team. For example, you play against Manchester City. It is a fact that your opponent will have control of the ball from the first minute and will look for a space in your defense to score a goal. Let’s say your attack power is weak. Your primary plan is to score from the counterattack. Your team made a mistake and conceded a goal right just at the beginning of the game. When the referee blows the final whistle, the scoreboard shows you are 1–0 behind. This mistake is critical for a team playing this way.

There is another side of view. We conceded a goal due to dispossession, but at the end of the match, we won 4–1. We have 3 points. Maybe that mistake motivated us to play better.

Come on, let’s play a little game. Let me show you a few mistakes, and you guess the impact of them.

Match result: Levante 0–5 Barcelona

A poor mistake by the Levante defense gave Barça a chance with high xG. Do you think it’s a critical mistake?

Match result: Barcelona 4–0 Real Betis

Dani Alves dispossessed the ball. In just a few seconds, Betis was one-on-one with the goalkeeper.

Match result: Barcelona 3–4 Real Betis

Another Barca-Betis match. Ter Stegen makes an inaccurate pass. This means a new attack organization for Betis. Receiving the ball in the penalty area with the 3rd pass, Betis scored with the 4th pass.

Match result: Barcelona 4–2 Eibar

A bad pass by Eibar causes Barcelona to be one-on-one with the goalkeeper.

Match result: Cadiz 2–1 Barcelona

A poor control by Clément Lenglet is followed by a bad clearance by Ter Stegen. We are focusing on Lenglet. Finalizing this mistake by scoring a goal, Cadiz won the match 2–1.

I think you can guess where this blog post is going now. In short, I built an ML model to measure how critical the mistakes are. When building this model, I made use of StatsBomb’s open data.

Before I get into the subject, I would like to share an important detail with you. I’ll not assess how bad the level of the mistakes is, but how much they affect the outcome of the match. So don’t let your mind ask if Stegen’s pass is such a bad mistake as well.

Moreover, the difference between the probability of conceding a goal in the metrics measuring possession value and this model is that they evaluate the action over a single goal, while I evaluate it over the whole match.

What are the mistakes?

  • Miscontrol
  • Dispossession
  • Clearance
  • Interception
  • Inaccurate Pass: I did not take the passes, whose target area is the away third, as a mistake because, for example, an inaccurate cross is not actually a mistake. Not every attack has to be a positive output.
  • Own Goal
  • Foul leading to a penalty: The reason why I didn’t take every foul as a mistake is that not knowing for what purpose fouls are made. A foul committed in his own penalty area has the potential to directly change the fate of the game. Compared to pre-penalty possession values, penalty kicks are wildly overpowered.

What variables did I use?

  • The location of the action that we consider as a mistake
  • Mistake Type
  • Total xThreat (+ xG) from the kick-off to the mistake (Separately for own team and difference)
  • Total xThreat (+ xG) from the mistake to the end of the match. (Separately for own team and difference)

I wanted to use these variables for two purposes. One, I thought that this is how I could distinguish between the dominant side and the non-dominant side in the match. Second, “before and after xT” values will also serve as a kind of match time. For example, we conceded a goal at the beginning of the game due to a mistake, but our xT score from the mistake to the end of the match is very high compared to the opponent. So we can expect the impact of the mistake to be lower.

  • Max xT (or xG) reached by the opponent in 15 seconds

When making predictions, the machine will not know whether the mistake caused the goal or not, instead, it will know the maximum probability of scoring reached by the opponent. Thus, it will have an idea of ​​the impact caused by the mistake.

  • Game State

The significance of the mistake may vary depending on the game state. It doesn’t matter if you make one more mistake and concede a goal in a game where you are 5–0 behind. However, in a game where you lost 6–0, it can be important to cause the first goal conceded.

So what about our output?

It’s a classification problem. There are two classes in the training set that the machine will learn from. If a team won the match, the mistake is of no consequence. Therefore, we assign the class “1” only to the mistakes of the teams that lost or drew.

Wait, it’s not that simple.

If the mistake ends up with a conceding a goal in 15 seconds,

If the score difference is between -2 and +2 at the moment of mistake,

The class to which this observation belongs will be 1.

Let me be more descriptive of the scenarios. In all these example cases, assume that your team didn’t win.

You are 2–0 behind. Due to a mistake, you conceded another goal. It’s a critical mistake because you caused the goal difference to widen too much.

You are 2–0 ahead. You conceded a goal due to a mistake. It’s a critical mistake because the goal difference is at an alarming level.

You are 3–0 behind. Due to a mistake, you conceded another goal. It is NOT a critical mistake because the match has already been lost.

Let’s continue with the scenario just above. Luckily you scored 3 goals. You are 4–3 behind now, but you made a mistake, and the score is 5–3. It’s a critical mistake.

Under the light of all that I have told, I built the model using various algorithms.

Probability threshold is 0.50. The more threshold, the more Precision and the less Recall value.

I chose Precision, AUC and Brier score as model evaluation metrics. According to the metrics, I go with xGBoost.

I want to draw attention to one point here. Recall value is quite low. I figure this value is less important than Precision in this situation.

A mistake made, for example, resulted in a 0.02 xG shot, but the opponent scored this low chance. Although the chance ended up in a goal, in fact, this is less likely to be a critical mistake on a repeatable basis because only 2 out of 100 shots can be a goal.

In addition, there is a lot of risk to say “you made a critical mistake” to players who did not make a critical mistake. As I mentioned at the beginning, the game is low-scoring and a mistake that the machine will exaggerate can lead to unfair criticism directed at players.

Let’s see how our new model plays the guessing game we played at the beginning.

̶A̶ ̶p̶o̶o̶r̶ ̶m̶i̶s̶t̶a̶k̶e̶ ̶b̶y̶ ̶t̶h̶e̶ ̶L̶e̶v̶a̶n̶t̶e̶ ̶d̶e̶f̶e̶n̶s̶e̶ ̶g̶a̶v̶e̶ ̶B̶a̶r̶ç̶a̶ ̶a̶ ̶c̶h̶a̶n̶c̶e̶ ̶w̶i̶t̶h̶ ̶h̶i̶g̶h̶ ̶x̶G̶.̶ ̶D̶o̶ ̶y̶o̶u̶ ̶t̶h̶i̶n̶k̶ ̶i̶t̶’̶s̶ ̶a̶ ̶c̶r̶i̶t̶i̶c̶a̶l̶ ̶m̶i̶s̶t̶a̶k̶e̶?̶

Model’s prediction is %1.1. Pretty much the result we expected.

We will examine the results with the help of the ModelStudio package. According to the breakdown plot, although the Max xT value reached within 15 seconds increases the model estimate to 20.9%, the fact that Levante is already behind by 4 goal differences causes a result we hope to see. It’s a non-critical mistake made in an already lost match.

̶D̶a̶n̶i̶ ̶A̶l̶v̶e̶s̶ ̶d̶i̶s̶p̶o̶s̶s̶e̶s̶s̶e̶d̶ ̶t̶h̶e̶ ̶b̶a̶l̶l̶.̶ ̶I̶n̶ ̶j̶u̶s̶t̶ ̶a̶ ̶f̶e̶w̶ ̶s̶e̶c̶o̶n̶d̶s̶,̶ ̶B̶e̶t̶i̶s̶ ̶w̶a̶s̶ ̶o̶n̶e̶-̶o̶n̶-̶o̶n̶e̶ ̶w̶i̶t̶h̶ ̶t̶h̶e̶ ̶g̶o̶a̶l̶k̶e̶e̶p̶e̶r̶.̶

Prediction is %7.7

From a model standpoint, it’s not a mistake with very important consequences. Reaching 0.40 xT adds a serious mood to the mistake. Since Barcelona is a much more dominant side in the period from this mistake to the end of the match (which draws attention as the xT difference of 4,734), this mistake does not have a significant impact according to the model. Of course, in the long term, the 7% estimates that we underestimate may have a negative impact on the team.

̶A̶n̶o̶t̶h̶e̶r̶ ̶B̶a̶r̶c̶a̶-̶B̶e̶t̶i̶s̶ ̶m̶a̶t̶c̶h̶.̶ ̶T̶e̶r̶ ̶S̶t̶e̶g̶e̶n̶ ̶m̶a̶k̶e̶s̶ ̶a̶n̶ ̶i̶n̶a̶c̶c̶u̶r̶a̶t̶e̶ ̶p̶a̶s̶s̶.̶ ̶T̶h̶i̶s̶ ̶m̶e̶a̶n̶s̶ ̶a̶ ̶n̶e̶w̶ ̶a̶t̶t̶a̶c̶k̶ ̶o̶r̶g̶a̶n̶i̶z̶a̶t̶i̶o̶n̶ ̶f̶o̶r̶ ̶B̶e̶t̶i̶s̶.̶ ̶R̶e̶c̶e̶i̶v̶i̶n̶g̶ ̶t̶h̶e̶ ̶b̶a̶l̶l̶ ̶i̶n̶ ̶t̶h̶e̶ ̶p̶e̶n̶a̶l̶t̶y̶ ̶a̶r̶e̶a̶ ̶w̶i̶t̶h̶ ̶t̶h̶e̶ ̶3̶r̶d̶ ̶p̶a̶s̶s̶,̶ ̶B̶e̶t̶i̶s̶ ̶s̶c̶o̶r̶e̶d̶ ̶w̶i̶t̶h̶ ̶t̶h̶e̶ ̶4̶t̶h̶ ̶p̶a̶s̶s̶

Prediction is %28.7

This was a more difficult question. The danger caused by the mistake is quite serious. Even Barcelona is 3–2 behind. These two factors greatly increase the negative effect. However, before the exemplary action (from the beginning of the match), the xT difference appears to be +2.71 in favor of Barcelona. This means that Barcelona should have been in the lead already in this match. Moreover, they produced a lot of xT after the mistake. The only conclusion I can’t understand is why their xT before the mistake had a negative impact on the severity of the mistake.

̶A̶ ̶b̶a̶d̶ ̶p̶a̶s̶s̶ ̶b̶y̶ ̶E̶i̶b̶a̶r̶ ̶c̶a̶u̶s̶e̶s̶ ̶B̶a̶r̶c̶e̶l̶o̶n̶a̶ ̶t̶o̶ ̶b̶e̶ ̶o̶n̶e̶-̶o̶n̶-̶o̶n̶e̶ ̶w̶i̶t̶h̶ ̶t̶h̶e̶ ̶g̶o̶a̶l̶k̶e̶e̶p̶e̶r̶.̶

Prediction is %49.4

It’s a bad mistake this time. You might think this is a mistake made at the beginning of the match, and they are already ahead 1–0, but from this minute until the end of the match, there is a serious Barcelona hegemony. “After xT Diff” variable boosted the model’s prediction by +0.122.

̶A̶ ̶p̶o̶o̶r̶ ̶c̶o̶n̶t̶r̶o̶l̶ ̶b̶y̶ ̶C̶l̶é̶m̶e̶n̶t̶ ̶L̶e̶n̶g̶l̶e̶t̶ ̶i̶s̶ ̶f̶o̶l̶l̶o̶w̶e̶d̶ ̶b̶y̶ ̶a̶ ̶b̶a̶d̶ ̶c̶l̶e̶a̶r̶a̶n̶c̶e̶ ̶b̶y̶ ̶T̶e̶r̶ ̶S̶t̶e̶g̶e̶n̶.̶ ̶W̶e̶ ̶a̶r̶e̶ ̶f̶o̶c̶u̶s̶i̶n̶g̶ ̶o̶n̶ ̶L̶e̶n̶g̶l̶e̶t̶.̶ ̶F̶i̶n̶a̶l̶i̶z̶i̶n̶g̶ ̶t̶h̶i̶s̶ ̶m̶i̶s̶t̶a̶k̶e̶ ̶b̶y̶ ̶s̶c̶o̶r̶i̶n̶g̶ ̶a̶ ̶g̶o̶a̶l̶,̶ ̶C̶a̶d̶i̶z̶ ̶w̶o̶n̶ ̶t̶h̶e̶ ̶m̶a̶t̶c̶h̶ ̶2̶-̶1̶.̶

Prediction is %75.2

It’s a very critical mistake. The fact that the error leads to a shot with 0.80 xG greatly increases the mistake impact. Moreover, the match was a tied. So our model didn’t find the 2.82 xT produced in the remaining half-hour to be enough. The opponent reached the figure of 1.55 xT during this time.

In My Opinion…

The performance of the model satisfied me enough, but of course, there was some confusion I encountered. I want to talk a little bit about them.

Since I see all actions like Interception and Clearance as potential mistakes, I classified them as mistakes. My guess is that if a clearance is successful, the ball is already in their team and within 15 seconds the opponent is very unlikely to generate a high xT. Therefore, the machine often makes very low estimates. Although my prediction was partially correct, I also came across some examples. For example, in a corner, the defender cleared the ball with his head, but then the ball remained in the opponent and they had a 0.50 xG chance. If I look at it as a football fan, I wouldn’t call it a mistake. However, logically, it was that action that caused the danger. As I said, this model measures the impact it causes, not the mistake itself.

At first, I also used “the match time” variable. Interestingly, it wasn’t one of those important variables. Even its effect was very weak.

The output I use while training the machine can be changed because the score difference has different meanings according to the match time. I assigned score changes between -2 and 2 to the ‘1’ class, but if you’re already 2–0 behind at 90+5, and you conceded a goal, it might not matter much.

15 seconds may be a long time to wait for the impact. Maybe it would be more logical to reduce it to 10 seconds or think in terms of the number of actions.

What did I learn?

I’ve been working on this for a few weeks and it’s been an educational process for me. The output I chose actually created an imbalanced data. Dealing with this created an extra challenge. I tried various methods such as undersampling and oversampling but the predictions were not realistic at all, etc.

On the one hand, while continuing the certificate program on Data Analytics, on the other hand, I want to do such works because I guess it won’t do me any good if I don’t practice what I’ve learned. Of course, not every work is going to be great.

I care about the criticisms of Tony El Habr who helped me a lot. (I asked him so many questions, I said, I guess he’s going to block me.) This is a difficult concept to quantify for him. Therefore, he emphasized his some methodological concerns. He may be right. I’m sure there will be those who find a more effective method.

Thank you for taking your time to read this blog post :)