In response to a few questions on my last post regarding my World Cup win probability (WP) model, here are some actual numbers to chew on. An anonymous commenter pointed us to actual win rates at WhoWins.com (a fun site by the way). I've graphed the actual rates below.
The actual win rates are for 708 games stretching all the way back to 1930. The theoretical WPs based on a Poisson distribution are the solid lines, and the actual rates are the little triangles and squares. Keep in mind these are the WPs for the trailing team.
That backwards result could be due to strategy effects. Trailing teams would be expected to become more aggressive, increasing their chances of winning or tying, but also risking falling further behind. On the other hand, the team ahead would be expected to hunker down and adopt a defensive strategy, largely neutralizing the trailing team's increase in aggressiveness. But the differences are too large to be explained by strategy effects. If strategy effects are stronger than team-strength effects, we'd be seeing a much more aggressive style of play than we do. Instead, we are witnessing a very safe style with low scoring rates (even for soccer).
And I think that's what's really going on. Scoring rates in World Cup play are at historically low levels. My theoretical model is based on 2006's scoring rate of 2.4 total goals per game (1.2 goals per side). According to Steven Dubner in the link above, so far this year the rates are even lower.
The lower the overall scoring rate, the harder it is to overcome a deficit. This is easy to understand intuitively. Just compare basketball, where being 2 scores down at halftime can be easily overcome, to soccer, where being 2 scores down at halftime is almost insurmountable. The effect is same within a single sport for varying scoring rates.
If I change my model's parameter for scoring rate from 2.4 to 3.0 total goals per game, the theoretical and actual win rates match up very well. I suspect 3 goals per game is pretty close to the typical World Cup scoring rate averaged out over the years.
However, the actual and theoretical rates still don't match exactly. In a game's early minutes, trailing teams' actual rates still exceed the theoretical WP. This can be understood as a result of bias in the numbers. The higher the scoring rate for a given era, the more likely it is to have a game where a team takes an early lead. And the earlier in the game a team had a lead, the higher the scoring rate of the era was likely to be. For example, in 1950 when the rate was 5.4 goals per game, there were likely far more examples of early-game 2-goal leads, and that era will dominate the data.
Ultimately, I suspect the WP numbers estimated with the 2.4 goal-per-game parameter are reasonably close to what we can expect this year.
Lastly, I suspect the low scoring rates may also explain the unusually high number of upsets (and "upset-ties") so far. The lower the overall scoring rate, the more likely upsets are. If an underdog team scores 50% less often than its favored opponent, it would almost never win in a game of basketball. But in a sport where there is typically a total of 1 score all game long, it will win about one third of the time, ties notwithstanding.
There are several theories about the low scores in South Africa this year. Dubner suggests it may be the ball, widely disliked by players. I think there are several factors, the most important being the overall historical trend. I also agree with one of the comments that in the early group play, the better teams play conservatively, thinking, "Hey, let's just make it out of the group round. Then we'll crank it up." I suppose you could consider it a "meta-" strategy effect.