Breaking Down ESPN's Complicated QB Rating System

| by

I, for one, welcome our new statistical overlords.

ESPN has a talented new analytics team, and their first foray into football is their Total QB Rating. It seems the first thing anyone does when they get into advanced football stats is to create their own QB rating system. The QBR is a major improvement over the NFL's traditional passer rating, and there are a lot of things I like about it, but it's not perfect. I'll try to summarize my understanding of the stat, and then I'll list the things I like about it and the things I don't like so much. As we say in the fighter pilot business--the goods and others.

According to ESPN's own explanation, the stat is based on three primary concepts--Expected Points, Win Probability, and division of credit. As I understand it, QBR begins with a QB's Expected Points Added for each play in which he was directly involved, including both pass plays and runs. It modifies each play's EPA value according to a clutch factor, which is based on Win Probability (WP). Here, I use something similar known as Leverage Index (LI). LI is the ratio of the potential swing in WP for a play compared to the average play's potential swing in WP. For example, an LI of 3 means that a play is 3 times more critical to a game's outcome than the typical NFL play. (You can find any play's LI on the interactive WP graphs here by hovering your cursor over the graph. I still consider it a 'beta' stat because I haven't settled on a final, single definition of potential success and failure for every play.)

QBR also divides credit for plays according to ESPN's own analysis. For example, they divide credit for a passing play between passer and receiver according to the Yards After Catch (YAC) and other factors. This is analogous to the Air Yards (AY) concept I introduced in 2007, which long-time readers here might be familiar with. QBR appears to go beyond what AY does as it apportions credit for things like pass interference calls, dropped balls, and passes defended. I'm not sure if they are charting every pass and individually and assigning credit pass-by-pass, or if they used their analysis to assign a split value for each class of play. For example, every screen pass might be split 10/90 for the QB and receiver, or every pass defended is split 40/60 (or whatever the actual figures are supposed to be).

Lastly, QBR is normalized on a 100-point scale, where 50 is average and a Pro Bowl caliber season is between 65 and 70. An excellent individual game might be as high as 90. Ultimately, the units of QBR are....nothing? Passitons? QB-trinos? I'm teasing, but I'll explain why this is important below.

The Goods

1. It includes sacks, running, fumbles and all the other important things that the traditional NFL passer rating doesn't.
2. It doesn't double count anything, as the NFL passer rating does with completions.
3. It is based primarily on EPA, which accounts for down, distance, and field position.
4. It is also based on WP, which considers time and score.
5. It's a rate stat instead of a cumulative stat.

The Others

1. It is proprietary.
2. It is unit-less.
3. It is an amalgamation of other stats.

I'll explain my criticisms below:

I'm not a big fan of proprietary statistics. I understand there are good reasons to protect an intellectual investment, but I'd like to see a lot more detail about their methodology. I've published all my models in full except for the WP model. And even then I divulged as much as possible about how I created the WP model, and created a publicly available calculator. I believe openness improves the models and their credibility. I'd encourage ESPN to publish their EP model. It's not an easy thing to create, especially for 2nd and 3rd downs.

There are details I'd be curious about, such as how they count subsequent kickoffs after a score, or how they estimate the value of a 4th down in no-man's land, where you can't predict whether a team will go for it, punt, or try a long FG. How do they count the last play or drive of a half, when an offense might make halfhearted attempts to score or throw a bomb that is as likely to be intercepted as fall incomplete?

Making it purely proprietary also guarantees it won't be used by any other major outlet. Fox, CBS, NBC, and the NFL itself will probably ignore it. It won't replace the traditional passer rating, despite being a major improvement.

Units are valuable because they give stats meaning. If I told you Kurt Warner was a 63, so he should get into the Hall of Fame, you'd say 'huh?' But if I said Kurt Warner was worth 3.7 WPA per season (making an otherwise average team win 11.7 games on average), you'd say 'I get it.'

Stats with units are much more useful in analysis than unit-less stats. I use WPA (whose units are wins) and EPA (whose units are points) for game-theory analysis of play calling and for decision analysis for 4th downs, onside kicks, etc. In the end, I'd like my analysis to be useful more than anything. Facing a 4th and 8 from the opponent's 36, in playoff-format overtime, a coach would want to know whether to run, pass, kick or punt. WPA says punt! QBR says Matt Ryan is a 62! DVOA says Ryan's a 14%! You get the idea.

When devising a stat or a measure of any sort, it's important to first ask what is its purpose. In this case, QBR's purpose is solely to rank NFL QBs. It's purpose isn't to do all those other things, so it's ok that it doesn't. I only raise this point to explain why I prefer more useful numbers with meaningful units.

Also, a stat with meaningful units acts as a currency of the entire sport. WPA or EPA can compare the impact of performances of RBs, QBs, WRs, and even kickers. If ESPN creates a RB stat, which I expect they would at some point, you wouldn't be able to compare a QB 65 with a RB 65. But a QB's 25.5 EPA can be compared directly with a RB's 10.2 EPA. We could compare the impact of someone like Devin Hester to someone like Chris Johnson. Without comparisons like these, we wouldn't discover things like how only the very best RBs actually provide a positive impact.

QBR is a mix of other measures. At its core, it basically combines EPA, WPA, and Air Yards. Suppose quarterback Farley and quarterback Andrews both have a QBR of 60. Say Farley got his 60 because of very high EPA but low WPA in high leverage situations (like Philip Rivers last year), and Andrews got his 60 QBR with lower EPA but high WPA in high leverage situations (like Matt Ryan last year).

I'd prefer to look at a line of various stats--such as EPA / WPA / AY / Success Rate (SR). In the example above the two QBs' season stats might look like this:

Farley      +100 EPA / +1.2 WPA / 5.5 AY / 50.1% SR
Andrews  +85 EPA / +3.2 WPA / 4.2 AY / 45.1% SR

This is far richer information. We can see who was more consistent, who was more productive overall, who relied on his receivers' YAC, and who was luckier (or 'clutch' if you prefer). We can get an idea of who is more likely to repeat his above average performance into the near future. A single number can't tell us anything like that. This is not an indictment of QBR, just my own preference.

If I were to settle on a single number, it would be WPA/LI, as Tom Tango has suggested. One day I'll get around to settling on a good definition of how to define typical football 'successes' and 'failures' in terms of a swing in WPA. It's worth several posts and a healthy discussion with the community. Until then, my LI stat is just a work in progress.

Ultimately, I think QBR is interesting and is a thousand times better than the traditional passer rating. But it's not very useful outside of ranking QBs from top to bottom. It might be fun fodder for discussion on TV or Internet message boards, but it's hard to see how it's useful beyond that.

I congratulate the guys in Bristol for putting this together. It took years to develop the EP and WP models I use, and it was no small task. It looks like they have the tools for a very sound analytics program going forward. I also like that they chose the concepts that Advanced NFL Stats has championed. There are other models and approaches out there, and I take this as a vote of confidence for the tools developed here.

Full disclosure: I had several exchanges with one of the ESPN analysts over the past 12 months, answering questions and explaining various aspects of the Advanced NFL Stats models.