“Strike” is Just a Six Letter Word

Pitches can end up one of only six ways:

  1. In the zone, swung at and missed
  2. In the zone, swung at and hit
  3. In the zone, taken for a strike
  4. Out of the zone, swung at and missed
  5. Out of the zone, swung at and hit
  6. Out of the zone, taken for a ball

Using “Plate Discipline” data from FanGraphs, we can “decompose” every pitch into one of the above outcome for both hitters and pitchers. See?

The plate discipline data is compiled from Baseball Info Solutions, though Pitch F/X data is also available. I went with the BIS data because it showed first in the list, not because the guy who sells it insists it’s the best.

A discrepancy in the data

For almost every pitcher there is a discrepancy between how many pitches that pitcher had called as a ball and those that actually were a ball. For some, such as Livan Hernandez, that discrepancy is huge.

How big are these discrepancies for other pitchers, what do they mean, and what are their causes?


In the aggregate, from 2002-2011 (the only years available) umpires definitely favored pitchers.


The picture on the left illustrates the magnitude of a pitcher’s favor (further to the right means indicates that a higher number of balls were called as strikes as a percentage of total strikes) and the picture on the right gives you an idea of the distribution.

As you can tell from the picture on the left when pitchers get helped, they get helped way more than the pitchers who get “hosed”. The tail is much, much taller on the right than on the left, meaning the most helped pitcher was helped far more than the most hosed pitcher was hosed.

As you can tell from the picture on the left, many more pitchers get helped than get hurt, and far, far more don’t get pushed one way or the other. (Note that the distribution is centered at 0, that is, no bias. Over 50% of pitchers fall within 1% of no bias, meaning most pitchers were unaffected by gracious and/or stingy umpires.)

And, as you can tell from both pictures, there seems to be no rhyme or reason as to where you end up. Livan was the third most “hosed” pitcher from 2002-2011 as well as the second and first most helped. What is going on here?

BIS isn’t wrong, the strike zone is moving

After a very, very cursory search (first page of Google results and BIS homepage) there seems to be no evidence that BIS has changed the way they measure strikes and balls over time. This could have explained the unusual results from above, since more balls seem to have been called as strikes over time. Had BIS changed the way they measured the zone while the zone itself was kept the same, one would expect to find this trend.

Instead, I have decided, the way umpires have called the strike zone has changed over time.


On the y-axis is the average percentage of strikes that would needed to have been called balls to erase the “strike deficit” across all pitches. Put more provocatively (and not wholly correctly) this is the percentage of strikes that were actually balls (see the technical note at the end).

It is very clear that strike deficits have been growing recently after relative fairness and even some bias toward hitters. In 2011, however, 2.5% of all strikes were actually balls. My preliminary conclusion: the strike zone has changed.

Nagging questions

After accounting for the “ump bias”, could the remaining variation in strike deficits simply be random? Tell me, for instance, if this table makes any sense to you:

These are the pitchers in the tails of the distribution, the ones getting hurt and helped the most. (Those getting hurt are on the right.)

It is not clear there is a pattern to umpire bias based on pitching style. There are hard throwers in both tails and pitchers with good movement in both. There are pitchers who throw a lot of strikes in both. There are old and young pitchers on both sides as well.

One thing to notice, however, is that there are more outliers on the positive side and those outliers are more extreme. This should not happen with random variation as we commonly conceive of it.


Most important, however, is that some pitchers show up multiple times in both tails. If pitchers were recipients of generalized “ump bias” then who is in the tails should be random. Yet we see Clayton Kershaw and Anibal Sanchez in the “losers” tail two times apiece. Meanwhile Livan Hernandez shows up with the “winners” twice while we see Derek Lowe there four times.

A pattern in the shuffle

I noticed there was some positive correlation between the number of actual balls a pitcher threw with his propensity to get those balls called as strikes. It does not seem likely that just because a pitcher is throwing a lot of balls that an umpire would start calling those balls strikes. Instead it seems that certain pitchers knew with that their balls would be called strikes. This definitely appears to be the case with Livan and Derek Lowe.


These tables show, remarkably clearly, that when either pitcher’s strike deficit percentage was higher (more balls being called as strikes) the percentage of their pitches that were actually balls rose in kind.

Catching matters

Utterly baffled, I then came across this article from Baseball Prospectus.

From the article:

In June 1993, Baseball Digest quoted Matt Nokes with his views on how umpires call the zone.

“Predictability is the key to getting borderline calls,” says Matt Nokes of the Yankees. “If the pitcher is consistent, then the umpire knows where to be looking. But if the catcher is jerking all around the plate and the ump does not know what is coming in where, it’s going to be harder for him to focus on those close pitches and you won’t get them. If the pitcher is throwing consistently where the catcher is setting up, he doesn’t have to be so fine. But if I set up inside and the pitch is on the outside corner, even if it is a strike, we’re not likely to get that call. Even if the pitch is over the outer half of the plate, it will be called a ball, because it missed the catcher’s target so bad. That’s just the way it is.”

And this excellent picture:


From Baseball Prospectus

Livan Hernandez consistently gets the outside strike. I am guessing this is because he can consistently hit the target and that his catcher (Pudge, when he’s healthy) is smart enough to take advantage of the fact.

What about preferential treatment?

The Baseball Prospectus article also contains the suggestion that age was a factor:


From Baseball Prospectus

But not in the sense that Livan simply gets strikes because he is old. From BP:

“Dan found that the older (or more experienced) a pitcher was, the bigger the zone he got from the umpires. It also happens to be true that the older a pitcher is, the more he pitches to the outside edges.”

That is, the pitcher does not get strikes necessarily because he is old but because he can hit his target.

Other factors accounting for the strike deficit seem are (once again, from BP) the counts the pitcher finds himself in and home field advantage. Personality is not mentioned.

“Anyone researching the performance of umpires in calling balls and strikes is strongly encouraged to consider the catcher target theory. It does not fully explain every umpire variation, but it appears to be the primary factor in many cases.”

Toward a conclusion

In yesterday’s article I argued that Livan Hernandez was the worst pitcher in the league the past two years.

But I was mistaken in assuming that the “by the rules” strike zone is what pitchers face. Pitchers face something more complicated than a static box determined by the plate and the height of the batter. If a pitcher can hit the catcher’s glove with little error and do so in or “near enough” to the strike zone, he will get a strike. If he is inconsistent he loses the “near enough” region and possibly even parts of the strike zone if he is too wild. In effect, an inconsistent pitcher faces a much smaller strike zone.

Understanding this, I now have to say Livan probably deserved to be about where xFIP had him, and possibly even where FIP did. Because Livan was able to produce such a large strike zone for himself, he consistently put batters at a disadvantage since his strikes were less hittable than a pitcher whose inconsistency forced him to be in the “by the rules” strike zone. This means hitters facing Livan likely had to swing at pitches they were less likely to make good contact with, possibly suppressing his HR/FB%. More importantly, Livan was able to sneak past more strikes than he otherwise would have.

FIP and xFIP remain important indicators since they still measure the important outcomes. But how a pitcher arrives at these outcomes is not as simple as it may appear.

FIP and xFIP should no longer be thought of as measuring solely what a pitcher can control or as inclusive. Both walks and strikeouts are influenced by the catcher, and a pitcher’s consistency is not explicitly present in either equation.

Further, it seems possible that a pitcher’s ability to expand his zone should be correlated with his ability to avoid good contact or to induce more groundballs etc., based on exactly how his zone is expanded. It was already known that pitchers could somewhat control their hit distribution (GB%, FB%, etc) but this may be another piece of the puzzle.

This discussion should, I hope, emphasize how little we understand pitching. A few years ago I doubt anyone would have anticipated Cliff Lee’s huge successes, which seem to stem from his very low walk rate. Jamie Moyer should have caused us to revisit the importance of throwing hard and age. Stephen Strasburg raises the question regarding the relative importance of velocity versus the ability to stay healthy. Now we have Livan emphasizing the importance of consistency. How do these things combine? In some way more subtle and precise than either FIP or xFIP can make clear. I suspect there could be a pitching revolution in here somewhere, and am patiently waiting for it to arrive.

Technical note

I said earlier that referring to the difference between the number of pitches called as strikes and the number of pitches that were “actually” strikes as the number of pitches that were called incorrectly was not wholly accurate. Consider this example:

A pitcher only throws two pitches. The first is in the strike zone and is called a ball. The second is out of the strike zone and is called a strike.

In the above example the pitcher would be recorded in the box score as throwing one ball and one strike and the same would be recorded by BIS. Seen on FanGraphs, we would not have any idea that in reality 100% of this pitcher’s pitches were called incorrectly and he would be seen as having no strike deficit.

It seems that this problem can be overcome by looking at pitch-by-pitch data from Pitch F/X. I am in no mood to go through that task and, anyway, I assume that someone else already has. At any rate, the FanGraphs data is likely a good enough approximation, as umpires probably call far more balls as strikes than they call strikes as balls.