(the following is a four part series originally published on 11/09/2009. I’ve republished it here because a) it took a lot of work, and b) it’s really meant to be read as one article, anyway.)
Special acknowledgement: This is far and away the most advanced, in depth thing I’ve ever tried. Without question, the best similarity I can come up with is asking someone who’s taken only high school Economics course to run the IMF, that’s basically what’s happened. As with any such endeavor, most of the actual work was done by others. With thanks to Jonathan Mayo, Will Moller, Joe Pawlikowski, Mike Axisa, Jim Johnson, Jamal Granger, Dave Cameron, Brent Nycz, Joshua Rosenberg, Dan Dilworth and Greg Fertel
This started out as one post but quickly got so big that it would not be fair to make you sit through all of it at once. It will thus be serialized, and as I do so, I will provide links back and forth. I’d provide a better introduction, but I’ve got to go scrape brains off the wall.
In this article, which is worth your time, Rob Neyer dares us to come up with a way to measure how many Championships Mariano has been worth.
Guess who enjoys masochism?
So, as you may know, there’s a myriad of stats out there, many of which I can only understand in theory, but there’s one measure that’s been created for the regular season that is very useful. You may have heard of it, as it’s called WAR–wins above replacement player.
NOTE: There are two measures we could use here, WAR and WARP, which try to accomplish the same thing (discussed below), but use two different sets of stats/data to do so. I’m going to stick with WAR because I think it sounds cooler. ANYWAY. So to understand WAR, two concepts are crucial: replacement level and leverage.
I understand that many of you reading this will already be familiar with both of these, but since my hope is that those that don’t delve into stats very often can follow, and for the sake of my sanity, hope you won’t begrudge me a refresher.
Just so you know where I’m coming from, I haven’t done proper math since I was 16/17, so anything you see is going to be pretty easy to understand.
ANYWAY, again. What’s with the digression, Rebecca? So Replacement Level. The idea behind replacement level is that you take any player in any line up on any given day and replace him with someone whose level of performance is what an average team can expect when trying to replace a player at minimal cost. In English, it’s saying that if, say, Andrew McCutchen went down on the Pirates with the flu, what’s the baseline production that the Pirates could expect from John Doe, who’s the cheapest available player to fill the spot? That production is replacement-level production.
Why not just use a league-average performance as a replacement? The answer is that the MLB statistics are largely skewed–MLB “regulars”, the guys putting up the big enough numbers to stay in lineups every day are a minority–while fringe players, those that struggle to stay in the big leagues, are much more common. Simply put, it’s easier to find a player that hits .250 than one that hits .330, but, like that student you wanted to kill because he got an A on that Spanish test while no one else did above a C, the one that hits .330 destroys the curve.
So, instead, you take into consideration what a GM and manager is likely to go for in the event of a player suddenly going down for a game or two–ie, your utility infielder. Most teams–and the Yankees, of course, are not most teams–will go for whatever option is least costly–dipping into the pool of fringe Major Leaguers, the pool considered “freely available talent”. Of course, if a player is lost for a season, it’s an entirely different thing, but that gets beyond our scope.
What you end up with is on one end, you have your normal team–say the 2009 Yankees, and on the other, replacement-level team you’ve a line up where Wil Nieves is your best hitter, or Sidney Ponson as your best pitcher. What WAR does, then, is like having Nick Swisher go up to Joe Girardi before game six, and say, “Dude, I gave the Yanks, like x number more wins this season than you would have if Jerry Hairston had been your every day right fielder.”
(Note: via fangraphs, Hairston’s 2009 registered a WAR of 1.0, which indicates he performed above replacement level. Actually, this is helpful to give you an idea of how poorly a team with all replacement-level players would perform over the course of a season. Replacement Level is not the bench guys on the Yankees; it’s the bench guys on the Nationals and the Pirates.)
So before we move on, let’s make sure we–okay, I–understand everything we’ve discussed:
1) The concept of Replacement Level enables us to compare performances of MLB “regulars” vs low-cost, “freely-available” replacement players.
2) WAR is designed to measure how many more wins player X will net his team over player Replacement Level (ie, our Swisher/Hairston faux metaphor).
3) The values set for what a RL-performance entails varies by position–ie, shortstops aren’t supposed to hit like right fielders, etc. Pitchers, too, have WAR. Over here you can see the rankings for pitchers, by WAR, for the 2009 season. To no one’s surprise, Zack Grienke tops the list. The type of season he had will do that to you.
Now here, what we want to do is find the WAR for only Mo’s postseason innings, and then convert that to Championships–one championship being eleven wins. A reliever’s WAR is likely to be lower than a starter’s because a reliever pitches so many fewer innings–and innings pitched/endurance is a relevant stat–ie, when you’re looking for “innings-eaters” and the like, that’s to what you are referring.
That said, a reliever’s innings–especially a closer’s–are often more high stress and involve more critical game situations. So what we need, then, is a way to account for leverage–which is one of the main components of a reliever’s WAR. You’ve seen leverage stats before–just think about those WPA graphs you see. This is the WPA graph from Game Six of the World Series: .
Game Six doesn’t exactly have a ton of high leverage situations–the Yankees took a lead fairly early and then built on it, eventually leading 7-1 and the game never being in much doubt. Many times when a reliever comes in, that line (there’s a technical term for it, of course, that went out my head the day the bell rung in 7th grade) is closer to the middle.
To explain further: the closer to the top or bottom of the graph that the line gets, the more in favor the outcome of a game is for a particular team. For example, in this game, we see the line go more and more towards the top of the graph–and on the side you see the top half labeled as “Yankees”. So the more this game went on, the more in favor of the Yankees it was–80% and then 90%, etc. Many times when a reliever comes in, that line is closer to the middle, not pointing decidedly towards either team, and the bars on the bottom of the graph give you an indication as to how important that particular situation in the game is–the higher the bar, the more critical the situation.
Now here’s where, depending on your outlook, things get really cool or, if you’re me, your head explodes: you can calculate WPA with “series probability added”–which would mean the probability that takes into account the current situation in a series–ie, are the Yankees up 3-1, down 1-2, tied with the Phillies or something else? Without going into specifics for the moment, this is a pretty simple concept–the deeper into a series you go, the more high leverage each at bat becomes.
Think about it this way: in 2004, before Roberts steals second, the Yankees are up 3-0 in the series and up in that game. The likelihood they’re going to win–only a few outs away from the World Series–is probably around, say 90% (this is a total guess, but you can probably find the data somewhere) for the game–and probably the series too.
Now, move forward a few days and it’s game seven, and things have changed drastically–the series is now 3-3, and thus every pitch thrown matters that much more, every at bat that much more high leverage. Of course, there was that early Damon grand slam, and whatnot, but, yet again, I digress.
What does all this matter? It comes down to this: you cannot accurately measure a reliever’s WAR, especially a postseason WAR, without taking into account the leverage situations in which they pitch.
Now, of course, the question is: how do we do this? Some suggest that we use WPA–win probability added–but there are problems with this. WPA is a team-oriented stat at it’s core, and the idea is to measure how much more likely a team is to win a game given a certain event, and it’s so dependent on leverage that it’s not a decent measure of raw talent. Read up for more info. I especially recommend clicking on the link to The Hardball Times that explains how WPA is more about a “feeling” than anything concrete That said, WPA stats do exist for individual players–the link in the paragraph above puts it much more succinctly than I can while running on three hours’ sleep.
Now, here’s where it gets really cool: the Leverage Index on which WPA is based is not an abstract concept; it’s based on real numbers. I direct you to this article from The Hardball Times that explains the math (ie, goes mostly way over my head), and then, the shiny finished product, the chart of Leverage Index, which has the actual numbers.
Take that in for a minute.
We can find out, for any team, any inning and any situation–men on base and number of outs–exactly how much more or less likely that team is to win the game, and how important that particular situation is.
For individual pitchers (since we’re dealing with pitchers), we can also calculate their average leverage index based on a) a player’s LI for all game situations (pLI), b) A pitcher’s LI for when he enters the game–as in a reliever that enters the game in the seventh, etc. (gmLI), c) A pitcher’s LI based on the inning in which he enters (inLI) and d) a pitcher’s LI when he leaves the game (exLI).
If you go back to this chart, you can see Rivera’s average numbers for these situations. For comparison’s sake, here are Ryan Madson’s numbers of the same. As you can see, the leverage situations in which the pitchers enter is roughly the same–they’re both back-end relievers pitching the most or some of the most critical innings for their teams–but that the WPAs are different.
Pitchers can’t control the leverage when they enter a game, but the leverage is a good indicator of the stress a pitcher might face coming in to relieve and how that pitcher responds to it is perhaps most valuable as a measure to determine how composed a pitcher remains in a tight spot. Hence our need for the Leverage Index.
Okay, so I realize my head might explode from all this, so take a break, get a snack, do some processing and come back here when you’re good to go.
(or, you know, at noon, when I post the next part).
So what do we have so far?
1) We’re attempting to figure out Mariano’s WAR for his postseason innings and then convert that to the number of championships Mariano is worth all by his lonesome.
2) We’ve explained the theory behind WAR and replacement level (though we haven’t gotten into the nitty gritty just yet)
3) We’ve discussed why it’s still all about the leverage, how really smart people have come up with absolute numbers for every conceivable innings-baserunners-outs situation, and how WPA, while shiny and a fun toy, is not as helpful as we would like because it’s a probability stat more than an absolute number.
Fortunately for us, since Fangraphs provides us with LI numbers, it’ll save us a bit of work.
The other key component of WAR is FIP, or fielding-independent pitching. The goal of FIP is simple: figure out how well a pitcher pitches in terms of events that are not dependent on the fielders–strikeouts, walks and home runs.
A little more advanced: the theory here is that things such as singles, doubles and triples, may be affected as much by the way the fielders play as by the way the pitcher pitches. The only events a pitcher directly controls occur when a batter does not make contact with the baseball, or when he hits a home run. Basically, all or nothing.
The formula provided by Beyond the Box Score for FIP is:
This formula will give you an odd looking decimal result; generally speaking you add 3.20 to it to get the FIP.
A note of caution: I was never able to get the formula to add up to the same results given by Fangraphs; since different sources do use different formulas (some don’t account for hit-by-pitches, some don’t account for intentional walks, etc), I’m going to attribute the difference to using a different formula than Fangraphs–since my results were generally close.
What we need to do, then, is to figure out the WAR numbers.
Let me state this as simply as I can: I have no issue understanding the theory behind WAR, as hopefully I have successfully explained above; however, WAR is a very complex calculation that is way beyond the scope of my doing. Fangraphs and Beyond the Box Score give formulas as to how to calculate them for pitchers here and here , but the problem is that every time I tried doing the formula myself, I’d end up with very, very wacky numbers.
I know Mo’s the Hammer of God, and all, but even the Hammer of God isn’t worth 74 wins all by his lonesome.
To make a very long and frustrating story short, I was saved by an Angel of Mercy (who has asked their identity not be revealed–which, alas, means no revelation of formula. Don’t worry, though, I still can’t figure it out). Said AoM sent me a magical calculator thingy (okay, a spreadsheet), and, well, now it’s just a question of doing the following:
1) Calculating Mo’s WAR for each of his postseasons
2) Adding the totals together.
3) Converting them to a scale that will allow us to compare what he’s done in the postseason with what he’s done in the regular season.
Alas, this is the part that is incredibly time-consuming.
WAR involves constants in their formulas that change from year to year, and since Mo’s been pitching in postseasons since 1995, that’s a lot of constants to go back and find.
Anyway, before I go into the hard data and the results I got, here are a couple of caveats–so if you want to try this on your own (masochist!), you might choose to adjust accordingly. I’d recommend that if you try (masochist), you use the link to the Fangraphs explanation, which calculates WAR for Felix Hernandez from scratch. The theory’s easy to follow, but because there are so many components to it, just one not-so-hot constant can throw it off. This is, of course, why they pay people to do these things.
1) We need to understand park factors, but this is a simple concept. Some ballparks favor pitchers and some favor hitters; the park factor is simply a number that attempts to describe whether a ballpark favors hitters or pitchers. The park factor used for 2009 is 0.975, which would make Yankee Stadium (and here you will laugh) a slight pitcher’s park–A neutral park that favors neither pitchers nor hitters will have a park factor of 1. What’s that you say? That can’t be right?
Well…yes. And no.
ESPN has a handy-dandy Park Factor sheet. Now you’ll see for 2009 the New Yankee Stadium is actually middle of the pack and registers 0.965. So why increase it to 0.975? New Yankee Stadium has only been around a year, which is a very small sample size. The extra 0.010…let’s just say it’s a nod to all those home runs.
For all the years before 2009, I took an average of park factors from 2001 to 2008 and came up with 0.962. Don’t worry–2005 had a park factor of 1.4+, but this was balanced by an absurdly low 2004. Our 0.962 constant is really right there in the median.
(Going backwards, the numbers we use are 1.040+0.987+0.877+1.403+0.694+0.933+0.957+0.805)
2) The leverage used is a modified gmLi (see above) that gives the pitcher some credit for the leverage of his situation, but not all–basically it says that it’s not Mo’s fault that Brian Bruney left the bases chucked with no one out, but if Mo puts his own runners on base and then lets them score, he’s gotta be accountable for that, too.
3) The FIP stats are from Fangraphs.
4) As for the constants in the formulas? People get paid to figure those things out. I rely on the AoM’s Magical Calculator Thingy.
So how does one use the Magical Calculator Thingy?
One needs five pieces of data:
1) The pitcher’s FIP,
2) The league’s RA (this is the average runs per game per team. Here we’re going to use the postseason).
3) The park factor-we’re using the 0.975 constant for 2009 and the average of 0.962 for every other year.
4) The pitcher’s innings pitched.
5) The modified gmLI
Since numbers 1, 2, 4 and 4 will change season to season, the calculations have to be done separately for each season. Depending on the park factor you use, this number can also vary.
As I’ve said before, calculating FIP is itself complex, but we can cheat and just look at advanced pitching stats from Fangraphs. Like many of the advanced stats, can use various formulas depending on the publication you are reading.
Anyway, the bottom line here is that we use Fangraphs’ Data because super smart people have already done this for us.
Doing the RA isn’t hard, but it IS tedious.
RA is simple–it’s just the total runs scored, divided by games played, and then divided again so you get an average runs scored per team. This takes into account all runs, not just earned runs–since earned runs are a somewhat sketchy stat. It’s easy enough to find the RAs for a particular season, since ESPN handily lists average runs at its stat page, but the numbers haven’t been done for the postseason.
What does this mean?
We’ve got to total every run scored by every team in every game in the ALDS, NLDS, ALCS, NLCS and World Series, and then divide that by the total games played in the postseason, and then divide again to get the number per team. Normally you just used it for the AL or the NL, but I like to consider the postseason a league in its own right.
Again, this isn’t hard, it’s just tedious.
Innings Pitched–this one’s easy. Google “mariano rivera stats” and click on any of the links. Since WAR is heavily dependent on innings pitched in terms of value, relievers’ PREWAR (postseason reliever WAR, don’t look at me like that, I just think it sounds cool) will be low–anything over 1 being positively insane.
To compare it to what a WAR would be at the same FIP over a full season, we’ll also plug in the numbers not just for postseason IP, but also for 70 IP, which is about what a closer will pitch over a full season. This will give us an idea of how valuable Mariano would be if he pitched at that same level over the course of a regular season (I’ve explained it more after we get our results).
When we get our numbers and we’re doing our aggregate totals, we’ll use both the PREWAR numbers and the converted PREWAR numbers, so if you want to try some fancy stats work on your own, you can go right ahead and do so.
So, let’s get to the nitty-gritty.
Well. In a litle bit. Check back at 2 PM…
Let’s start with the 2009 postseason.
We take the Magic Calculator Thingy and input the following:
Innings pitched: 16.
Fangraphs’ FIP: 2.28
Next, we go to Baseball Reference and click on the ‘postseason’ tab. We look at every box score of every postseason game and add up every single run.
ESPN also has run totals here, per team, but they only go back to 2002 and we will need to (eventually) go all the way back to 1995, so knowing how to do it just looking at the BR box scores is of some use.
That gives us a total of 260 runs scored.
We then add up the total number of games played–13 in the division series (three 3-game sets, one four gamer), 11 in the League Championship Series (six and five) and six in the World Series, for a total of thirty games played.
We divide 260/30 to get a total of 8.66 runs scored per game, and we divide that by two again and get a total of 4.33 runs scored per team on average.
We’ve got our first three variables, the fourth, park factor, has been preset at 0.975, so now we just need the fifth, the leverage index.
We’re going to use gmLI, because that’s the leverage index that gives us an average leverage number for when a pitcher enters a game, and, well, Mariano loves him some high leverage.
As we journey back to Fangraphs, we find a gmLI of 1.45, but don’t enter that in just yet. As discussed above, we need to modify it a little bit.
We do this by adding one (1.00, a neutral leverage) and then splitting the sum. That gives us our split leverage of 1.225.
NOTE: Postseason gmLI has only been calculated for the 2002 postseason onwards. To get Mariano’s PREWARs for the years before, the regular season gmLIs will be used, minus .44–which is the average difference between regular season gmLIs and postseason gmLIs taken from the years 2002-2009. It should be noted, however, that this figure is slightly skewed by the years in which Rivera appeared in just one postseason inning (mid 00s are chock full of these), and that the actual postseason leverage is probably a tad higher.
So we take our variables, enter them into the Magical Calculator, bada bing, bada boom, we come out with 0.542 PREWAR for the 2009 postseason.
WAIT! You say, how do you know that it works?
It’s pretty simple–plug in the values for the regular season (which uses pLI), and compare the results to the WAR listed on the Fangraphs’ leaderboard. Since those numbers are equal, we can assume they are correct and thus proceed.
So we have our 0.542 PREWAR. How does that compare to regular season WAR?
We change the IP from 16 to 70, or roughly a closer’s regular season innings, and get a result of 2.37, which is better than Mariano’s 2009 regular season WAR, though less than his 3.1 WAR in 2008 (3.1 is an utterly monstrous number for a reliever, even a closer, and Mariano’s 2008 was that good).
Anyway, so, in our PREWAR spreadsheet, we can fill in our first three columns, under the columns “year”, “PREWAR” and “converted PREWAR”.
2009: 0.542, 2.37
Now, it’s a question of doing the same for every postseason from 1995 onwards, with the exception of 2008, because the Yankees weren’t in it, and 2009, because, well, we just did that.
I’ll see you all some time next century.
Or at 4 PM.
Hey, so now that I’ve missed the last 100 years or so, what’d I miss?
Anyway, here’s what we’ve got in terms of a final tally:
This will give us our totals:
CONVERTED PREWAR: 24.016
Now, before we can go into what this data actually means, we need a couple of notes:
1) The data is slightly skewed because of the years in which the Yankees lost in the first round of the postseason. Just look at how much lower the numbers are for 2002, 2005, 2006 and 2007 to get an idea.
2) The Sandy Alomar Jr home run in 1997 kills Mariano’s PREWAR. For comparison’s sake: in 1997, Mariano’s postseason FIP was over 8.6(!) In 2003, his best postseason (and it’s not even close), the number is 1.28.
Okay, so now that we’ve cleared that up, go pour yourself a nice glass of wine as we discuss what the numbers mean.
The raw, unconverted PREWAR figure is 4.17, so let’s do that one first.
The unconverted number says that Mariano is worth over four wins in the postseason–the equivalent of one round, all by his lonesome self–but there’s a caveat here.
The raw numbers here measure a win as having the same value as a win during the regular season–ie, one win in 162 games. In the postseason, one win is worth a lot more. Since Mariano pitches relief innings only, his innings totals in the postseason are thus supressed–he’s never thrown more than 16 innings in a postseason–which in turns suppresses the value for WAR.
Now, the raw PREWAR numbers are useful, but they will be most useful when we can compare them to other postseason relievers–this is the epilogue post that you will see following this one, which, if I can figure out how to make one, will have a nice shiny graph.
Anyway, enough with the digressing.
So what we want to do here, then, is to convert Mariano’s PREWAR numbers to a number that would be representative to what Mariano would be worth if he pitched at the same scale in the regular season.
The conversion has been done in the table above, but just a refresher: to convert the numbers using the Magical Calculator Thingy, you change the input for Innings Pitched to 70, which is roughly what a closer would pitch over a full season (since becoming a full-time closer, Mariano has pitched between 60 and 80 innings per year, so this number actually works very well).
When we total up the CONVERTED PREWAR numbers, we get 24.016.
That would be, then, 24 wins.
Now, let’s go back and remember our very basic assumption, that it takes eleven wins to win a Championship.
Twenty four divided by 11 is, of course, just over two.
This means, adjusted to a regular-season scale, the Yankees have won two of their last five World Series, potentially for no other reason than that Mariano Rivera, and not another closer, was on the mound in the ninth inning.
Every time we go and we think that Rivera is the Hammer of God, something else comes around to show us that he’s even greater…