Special acknowledgement: This is far and away the most advanced, in depth thing I’ve ever tried. Without question, the best similarity I can come up with is asking someone who’s taken only high school Economics course to run the IMF, that’s basically what’s happened. As with any such endeavor, most of the actual work was done by others. With thanks to Jonathan Mayo, Will Moller, Joe Pawlikowski, Mike Axisa, Jim Johnson, Jamal Granger, Dave Cameron, Brent Nycz, Joshua Rosenberg, Dan Dilworth and Greg Fertel

This started out as one post but quickly got so big that it would not be fair to make you sit through all of it at once. It will thus be serialized, and as I do so, I will provide links back and forth. I’d provide a better introduction, but I’ve got to go scrape brains off the wall.

In this article, which is worth your time, Rob Neyer dares us to come up with a way to measure how many Championships Mariano has been worth.

Guess who enjoys masochism?

So, as you may know, there’s a myriad of stats out there, many of which I can only understand in theory, but there’s one measure that’s been created for the regular season that is very useful. You may have heard of it, as it’s called WAR–wins above replacement player.

NOTE: There are two measures we could use here, WAR and WARP, which try to accomplish the same thing (discussed below), but use two different sets of stats/data to do so. I’m going to stick with WAR because I think it sounds cooler. ANYWAY. So to understand WAR, two concepts are crucial: replacement level and leverage.

I understand that many of you reading this will already be familiar with both of these, but since my hope is that those that don’t delve into stats very often can follow, and for the sake of my sanity, hope you won’t begrudge me a refresher.

Just so you know where I’m coming from, I haven’t done proper math since I was 16/17, so anything you see is going to be pretty easy to understand.

ANYWAY, again. What’s with the digression, Rebecca? So Replacement Level. The idea behind replacement level is that you take any player in any line up on any given day and replace him with someone whose level of performance is what an average team can expect when trying to replace a player at minimal cost. In English, it’s saying that if, say, Andrew McCutchen went down on the Pirates with the flu, what’s the baseline production that the Pirates could expect from John Doe, who’s the cheapest available player to fill the spot? That production is replacement-level production.

Why not just use a league-average performance as a replacement? The answer is that the MLB statistics are largely skewed–MLB “regulars”, the guys putting up the big enough numbers to stay in lineups every day are a minority–while fringe players, those that struggle to stay in the big leagues, are much more common. Simply put, it’s easier to find a player that hits .250 than one that hits .330, but, like that student you wanted to kill because he got an A on that Spanish test while no one else did above a C, the one that hits .330 destroys the curve.

So, instead, you take into consideration what a GM and manager is likely to go for in the event of a player suddenly going down for a game or two–ie, your utility infielder. Most teams–and the Yankees, of course, are not most teams–will go for whatever option is least costly–dipping into the pool of fringe Major Leaguers, the pool considered “freely available talent”. Of course, if a player is lost for a season, it’s an entirely different thing, but that gets beyond our scope.

What you end up with is on one end, you have your normal team–say the 2009 Yankees, and on the other, replacement-level team you’ve a line up where Wil Nieves is your best hitter, or Sidney Ponson as your best pitcher. What WAR does, then, is like having Nick Swisher go up to Joe Girardi before game six, and say, “Dude, I gave the Yanks, like x number more wins this season than you would have if Jerry Hairston had been your every day right fielder.”

(Note: via fangraphs, Hairston’s 2009 registered a WAR of 1.0, which indicates he performed above replacement level. Actually, this is helpful to give you an idea of how poorly a team with all replacement-level players would perform over the course of a season. Replacement Level is not the bench guys on the Yankees; it’s the bench guys on the Nationals and the Pirates.)

So before we move on, let’s make sure we–okay, I–understand everything we’ve discussed:
1) The concept of Replacement Level enables us to compare performances of MLB “regulars” vs low-cost, “freely-available” replacement players.
2) WAR is designed to measure how many more wins player X will net his team over player Replacement Level (ie, our Swisher/Hairston faux metaphor).
3) The values set for what a RL-performance entails varies by position–ie, shortstops aren’t supposed to hit like right fielders, etc. Pitchers, too, have WAR. Over here you can see the rankings for pitchers, by WAR, for the 2009 season. To no one’s surprise, Zack Grienke tops the list. The type of season he had will do that to you.

Now here, what we want to do is find the WAR for only Mo’s postseason innings, and then convert that to Championships–one championship being eleven wins. A reliever’s WAR is likely to be lower than a starter’s because a reliever pitches so many fewer innings–and innings pitched/endurance is a relevant stat–ie, when you’re looking for “innings-eaters” and the like, that’s to what you are referring.

That said, a reliever’s innings–especially a closer’s–are often more high stress and involve more critical game situations. So what we need, then, is a way to account for leverage–which is one of the main components of a reliever’s WAR. You’ve seen leverage stats before–just think about those WPA graphs you see. This is the WPA graph from Game Six of the World Series: .

Game Six doesn’t exactly have a ton of high leverage situations–the Yankees took a lead fairly early and then built on it, eventually leading 7-1 and the game never being in much doubt. Many times when a reliever comes in, that line (there’s a technical term for it, of course, that went out my head the day the bell rung in 7th grade) is closer to the middle.

To explain further: the closer to the top or bottom of the graph that the line gets, the more in favor the outcome of a game is for a particular team. For example, in this game, we see the line go more and more towards the top of the graph–and on the side you see the top half labeled as “Yankees”. So the more this game went on, the more in favor of the Yankees it was–80% and then 90%, etc. Many times when a reliever comes in, that line is closer to the middle, not pointing decidedly towards either team, and the bars on the bottom of the graph give you an indication as to how important that particular situation in the game is–the higher the bar, the more critical the situation.

Now here’s where, depending on your outlook, things get really cool or, if you’re me, your head explodes: you can calculate WPA with “series probability added”–which would mean the probability that takes into account the current situation in a series–ie, are the Yankees up 3-1, down 1-2, tied with the Phillies or something else? Without going into specifics for the moment, this is a pretty simple concept–the deeper into a series you go, the more high leverage each at bat becomes.

Think about it this way: in 2004, before Roberts steals second, the Yankees are up 3-0 in the series and up in that game. The likelihood they’re going to win–only a few outs away from the World Series–is probably around, say 90% (this is a total guess, but you can probably find the data somewhere) for the game–and probably the series too.

Now, move forward a few days and it’s game seven, and things have changed drastically–the series is now 3-3, and thus every pitch thrown matters that much more, every at bat that much more high leverage. Of course, there was that early Damon grand slam, and whatnot, but, yet again, I digress.

What does all this matter? It comes down to this: you cannot accurately measure a reliever’s WAR, especially a postseason WAR, without taking into account the leverage situations in which they pitch.

Now, of course, the question is: how do we do this? Some suggest that we use WPA–win probability added–but there are problems with this. WPA is a team-oriented stat at it’s core, and the idea is to measure how much more likely a team is to win a game given a certain event, and it’s so dependent on leverage that it’s not a decent measure of raw talent. Read up for more info. I especially recommend clicking on the link to The Hardball Times that explains how WPA is more about a “feeling” than anything concrete That said, WPA stats do exist for individual players–the link in the paragraph above puts it much more succinctly than I can while running on three hours’ sleep.

Now, here’s where it gets really cool: the Leverage Index on which WPA is based is not an abstract concept; it’s based on real numbers. I direct you to this article from The Hardball Times that explains the math (ie, goes mostly way over my head), and then, the shiny finished product, the chart of Leverage Index, which has the actual numbers.

Take that in for a minute.

We can find out, for any team, any inning and any situation–men on base and number of outs–exactly how much more or less likely that team is to win the game, and how important that particular situation is.

For individual pitchers (since we’re dealing with pitchers), we can also calculate their average leverage index based on a) a player’s LI for all game situations (pLI), b) A pitcher’s LI for when he enters the game–as in a reliever that enters the game in the seventh, etc. (gmLI), c) A pitcher’s LI based on the inning in which he enters (inLI) and d) a pitcher’s LI when he leaves the game (exLI).

If you go back to this chart, you can see Rivera’s average numbers for these situations. For comparison’s sake, here are Ryan Madson’s numbers of the same. As you can see, the leverage situations in which the pitchers enter is roughly the same–they’re both back-end relievers pitching the most or some of the most critical innings for their teams–but that the WPAs are different.

Pitchers can’t control the leverage when they enter a game, but the leverage is a good indicator of the stress a pitcher might face coming in to relieve and how that pitcher responds to it is perhaps most valuable as a measure to determine how composed a pitcher remains in a tight spot. Hence our need for the Leverage Index.

Okay, so I realize my head might explode from all this, so take a break, get a snack, do some processing and come back here when you’re good to go.

(or, you know, at noon, when I post the next part).