This one is about the numbers. No pitcher to profile, no story to tell. Instead I'm sharing the initial output of a fairly extensive project—2010 pitch classifications.
I've managed to tag and review every pitch thrown so far in 2010, including spring training. The numbers below include only regular season games and, despite my best efforts, there are errors in the pitch classifications. Given the post hoc nature (as opposed to Gameday's real time) of this labels, and the mix of automate, psuedo-automated and manual processes, I'm fairly confident in the utility of the data set for at least one purpose—creating a baseline for a variety of metrics that can be referred to from here on out.
In other words, when it comes down to an individual pitcher, pitch tags can be moved around but, as a group, there is enough of a sample to use the following numbers as benchmarks.
Here's your big baseline, a lump of every classified pitch. Unclassified pitches are the result of PITCHf/x glitches (rare) or mid-plate appearance pitching changes (less rare).
Type = pitch type # = number thrown rvERA = a rough but reasonable estimate of pitch effectiveness based on linear weights and outcomes on ball/strike counts MPH = speed at release, 55 ft. from the back end of home plate Swing = swing rate (swings/pitches) Whiff = whiff rate (misses/swings), includes foul tips Foul = foul ball rate (fouls/swings) B:CS = umpire called ball-to-called strike ratio IWZ = rate of pitches thrown within a "wide" strike zone Chase = swing rate outside of the wide zone Watch = take rate inside of the wide zone (inverse of swing rate) nkSLG = non-K slugging, or SLGCON GB% = rate of balls in play tagged by MLBAM stringers as grounders LD% = line drives FB% = outfield fly balls PU% = infield fly balls HR/FL% = home runs per outfield fly + line drive
rvERA is not league adjusted, park adjusted or starter/reliever adjusted. Batted ball outcomes are regressed towards MLB average outcomes. It's a toy, maybe a fancy one, but a toy nonetheless.
These numbers include only 2010, so there are some weather-related changes to come. For example, fastballs (see below) will get faster and more fly balls will leave the park.
You'll notice, despite a generous strike zone, pitchers have trouble throwing strikes, and the average ground ball rate is 44 percent. Both the GB percentage and whiff rate are up from 2009, so some decline could be coming over the next several hundred thousand pitches.
Now, for each pitch type. You'll have to pardon my sometimes confusing two-letter abbreviations—please refer to this key.
CH = Change-ups, may include some splitters that tail more than tumble CU = Curveballs, probably some slurves F2 = Two-seam fastball, sinkers, tailing fastballs F4 = Four-seam fastball, generic fastballs FC = Cutters and some slutters, can be a fuzzy group FS = Splitters, foshes and forkballs, may include some other tumbling change-ups KN = Knuckleballs, although some of Eddie Bonine's (et al.) are not in here SB = Screwball, sole property of Danny Herrera SL = Slider or slurve, even some slutters
If a pitcher has a higher than expected HR/FL rate, will it regress toward league average or toward league average by pitch? For example, a fastball/curveball pitcher could be expected to give up more home runs per fly ball-plus-line drive than a sinker/change-up pitcher. If you get that awkward question, "where do ground balls come from?," you can answer "from sinkers and off-speed pitches." If your favorite pitcher doesn't command, or even own, a sinker, a slider or change-up can get the ground ball when needed. You can also look at the above table and understand why a fastball that gets a whiff rate north of .3 is so darn impressive, while a slider with the same rate may not be.
Now let's try some pitch types grouped together, but not in mutually exclusive groups. Cutters are in FC/SL and F4/FC—all of them. The CH/FS group is probably the most useful combination due to their similarity and overlap, followed by F4/F2 for the same reason. The rest are sketchy or totally arbitrary (KN/SB).
While I've already called most of these groupings arbitrary and sketchy, there is utility hidden in a few places. For example, the SL/CU group may be handy for "breaking pitches" of unknown variety. I'm sure creative minds can think of more uses, and more sophisticated approaches. I hope we'll see some of that in the comments. If nothing else, I hope this provides a handy reference.
References and Resources
PITCHf/x data from Sportvision and MLBAM. Pitch classifications by the author.
Harry Pavlidis admits he has a baseball problem. He also writes for Baseball Daily Digest, Beyond the Boxscore and his own blog, Cubs f/x. Feedback, questions and comments are appreciated - Email [email protected] and Twitter @harrypavThis one is about the numbers. No pitcher to profile, no story to tell. Instead I'm sharing the initial output of a fairly extensive project—2010 pitch classifications.