MLB

MLB Pitch-Type Statistical Analysis

| by Hardball Times

This one is about the numbers. No pitcher to profile, no story to tell. Instead I'm sharing the initial output of a fairly extensive project—2010 pitch classifications.

I've managed to tag and review every pitch thrown so far in 2010, including spring training. The numbers below include only regular season games and, despite my best efforts, there are errors in the pitch classifications. Given the post hoc nature (as opposed to Gameday's real time) of this labels, and the mix of automate, psuedo-automated and manual processes, I'm fairly confident in the utility of the data set for at least one purpose—creating a baseline for a variety of metrics that can be referred to from here on out.

In other words, when it comes down to an individual pitcher, pitch tags can be moved around but, as a group, there is enough of a sample to use the following numbers as benchmarks.

Here's your big baseline, a lump of every classified pitch. Unclassified pitches are the result of PITCHf/x glitches (rare) or mid-plate appearance pitching changes (less rare).

Popular Video

A police officer saw a young black couple drive by and pulled them over. What he did next left them stunned:

Popular Video

A police officer saw a young black couple drive by and pulled them over. What he did next left them stunned:

Data definitions:

Type = pitch type
# = number thrown
rvERA = a rough but reasonable estimate of pitch effectiveness based on linear weights and outcomes on ball/strike counts
MPH = speed at release, 55 ft. from the back end of home plate
Swing = swing rate (swings/pitches)
Whiff = whiff rate (misses/swings), includes foul tips
Foul = foul ball rate (fouls/swings)
B:CS = umpire called ball-to-called strike ratio
IWZ = rate of pitches thrown within a "wide" strike zone
Chase = swing rate outside of the wide zone
Watch = take rate inside of the wide zone (inverse of swing rate)
nkSLG = non-K slugging, or SLGCON
GB% = rate of balls in play tagged by MLBAM stringers as grounders
LD% = line drives
FB% = outfield fly balls
PU% = infield fly balls
HR/FL% = home runs per outfield fly + line drive

rvERA is not league adjusted, park adjusted or starter/reliever adjusted. Batted ball outcomes are regressed towards MLB average outcomes. It's a toy, maybe a fancy one, but a toy nonetheless.

Type
#
rvERA
MPH
Swing
Whiff
Foul
B:CS
IWZ
Chase
Watch
nkSLG
GB%
LD%
FB%
PU%
HR/FL%

All
131633
4.34
88
0.435
0.211
0.377
2.1
0.511
0.261
0.394
0.516
44%
20%
28%
7.4%
7.5%

These numbers include only 2010, so there are some weather-related changes to come. For example, fastballs (see below) will get faster and more fly balls will leave the park.

You'll notice, despite a generous strike zone, pitchers have trouble throwing strikes, and the average ground ball rate is 44 percent. Both the GB percentage and whiff rate are up from 2009, so some decline could be coming over the next several hundred thousand pitches.

Now, for each pitch type. You'll have to pardon my sometimes confusing two-letter abbreviations—please refer to this key.

Pitch-type abbreviations:

CH = Change-ups, may include some splitters that tail more than tumble
CU = Curveballs, probably some slurves
F2 = Two-seam fastball, sinkers, tailing fastballs
F4 = Four-seam fastball, generic fastballs
FC = Cutters and some slutters, can be a fuzzy group
FS = Splitters, foshes and forkballs, may include some other tumbling change-ups
KN = Knuckleballs, although some of Eddie Bonine's (et al.) are not in here
SB = Screwball, sole property of Danny Herrera
SL = Slider or slurve, even some slutters

Type
#
rvERA
MPH
Swing
Whiff
Foul
B:CS
IWZ
Chase
Watch
nkSLG
GB%
LD%
FB%
PU%
HR/FL%

SL
18722
3.82
84
0.456
0.327
0.317
2.4
0.484
0.304
0.374
0.505
45%
18%
29%
8.4%
7.8%

FS
1841
3.87
84
0.501
0.345
0.298
4.0
0.434
0.340
0.280
0.424
48%
19%
25%
7.3%
6.8%

CH
13325
4.16
83
0.491
0.307
0.291
3.7
0.441
0.325
0.287
0.452
50%
18%
25%
7.1%
6.9%

FC
7230
4.22
87
0.475
0.212
0.389
2.3
0.526
0.275
0.337
0.494
44%
21%
26%
9.3%
6.1%

CU
11156
4.47
77
0.373
0.261
0.327
2.2
0.467
0.254
0.487
0.512
49%
19%
27%
4.8%
8.3%

F4
46115
4.49
92
0.421
0.164
0.438
1.7
0.561
0.226
0.416
0.567
35%
21%
34%
9.6%
7.7%

F2
29551
4.53
91
0.430
0.128
0.391
1.9
0.543
0.239
0.400
0.499
52%
20%
23%
4.5%
7.3%

KN
821
4.88
69
0.445
0.227
0.384
2.7
0.515
0.236
0.348
0.563
37%
20%
32%
10.6%
8.1%

SB
42
5.26
66
0.333
0.071
0.286
2.5
0.429
0.250
0.556
0.111
44%
33%
11%
11.1%
0.0%

If a pitcher has a higher than expected HR/FL rate, will it regress toward league average or toward league average by pitch? For example, a fastball/curveball pitcher could be expected to give up more home runs per fly ball-plus-line drive than a sinker/change-up pitcher. If you get that awkward question, "where do ground balls come from?," you can answer "from sinkers and off-speed pitches." If your favorite pitcher doesn't command, or even own, a sinker, a slider or change-up can get the ground ball when needed. You can also look at the above table and understand why a fastball that gets a whiff rate north of .3 is so darn impressive, while a slider with the same rate may not be.

Now let's try some pitch types grouped together, but not in mutually exclusive groups. Cutters are in FC/SL and F4/FC—all of them. The CH/FS group is probably the most useful combination due to their similarity and overlap, followed by F4/F2 for the same reason. The rest are sketchy or totally arbitrary (KN/SB).

Type
#
rvERA
MPH
Swing
Whiff
Foul
B:CS
IWZ
Chase
Watch
nkSLG
GB%
LD%
FB%
PU%
HR/FL%

FC/SL
25952
3.93
85
0.461
0.295
0.337
2.4
0.496
0.296
0.364
0.502
45%
19%
28%
8.7%
7.3%

SL/CU
29878
4.06
81
0.425
0.302
0.321
2.3
0.478
0.285
0.416
0.508
46%
18%
28%
7.1%
8.0%

CH/FS
15166
4.12
83
0.492
0.312
0.292
3.7
0.440
0.327
0.286
0.449
50%
18%
25%
7.1%
6.9%

F4/FC
53345
4.45
91
0.428
0.171
0.431
1.8
0.556
0.233
0.405
0.557
36%
21%
33%
9.6%
7.5%

F4/F2
75666
4.51
92
0.425
0.150
0.420
1.8
0.554
0.231
0.410
0.540
42%
21%
30%
7.6%
7.5%

KN/SB
863
4.90
69
0.440
0.219
0.379
2.7
0.511
0.237
0.358
0.541
37%
21%
31%
10.6%
7.7%

While I've already called most of these groupings arbitrary and sketchy, there is utility hidden in a few places. For example, the SL/CU group may be handy for "breaking pitches" of unknown variety. I'm sure creative minds can think of more uses, and more sophisticated approaches. I hope we'll see some of that in the comments. If nothing else, I hope this provides a handy reference.

References and Resources
PITCHf/x data from Sportvision and MLBAM. Pitch classifications by the author.

Harry Pavlidis admits he has a baseball problem. He also writes for Baseball Daily Digest, Beyond the Boxscore and his own blog, Cubs f/x. Feedback, questions and comments are appreciated - Email [email protected] and Twitter @harrypavThis one is about the numbers. No pitcher to profile, no story to tell. Instead I'm sharing the initial output of a fairly extensive project—2010 pitch classifications.