In the Angels' 2023 home opener, Toronto's George Springer led off the game against pitcher Patrick Sandoval. What was the expected batting average for the at-bat? That is, what was the percentage chance that an official at-bat (a hit, out, or reach-on-error) would end in a hit?
Combining regular and postseason numbers, Springer ended up hitting .257 that year. (That's probably not quite his "true-talent" batting average, which is what he would hit if he could somehow have a million at-bats in a single season, thereby eliminating luck as a contributing factor. And it's not even his adjusted batting average, which adjusts for the difficulty of his opponents and ballparks relative to the league average. But we have to start somewhere, and it's with raw hits per at-bat.) So it would be reasonable to assume that there was about a 25.7% chance the at-bat would end in a hit.
Sandoval, the pitcher, ended up with almost the same batting average allowed: .256. This confirms there is probably a 25.6% or 25.7% chance of a hit, right?
Well, that depends on the major league batting average, the average for all MLB. It was .248 last year, which means Springer and Sandoval were both slightly more hit-prone than average (good for Springer, bad for Sandoval). If Springer hit .257 against average pitching, and Sandoval allowed .256 against average hitting (again, we're assuming the true talents of both men in average environments, for now), and average is .248, then the batting average for the match-up would be higher than either of their individual averages.
There's a simple calculation for that: the batter's BA plus the pitcher's BA, minus the league BA. In this case, where the numbers are close to average, you can use this simple add/subtract formula or the odds ratio method and they both get you the same answer: .265.
.257 + .256 - .248 = .265
That's the expected batting average, or xBA, for the Springer/Sandoval matchup. Since Springer flied out, he got zero hits, which was .265 less hits than expected, or -.265 hits above average (HAA). If he'd gotten a hit, it would've been 1 - .265 = .735 more hits than expected, or .735 HAA.
Here's the rest of the 1st inning:
In the first inning of the first game at Angel Stadium in 2023 there were 9 at-bats and 2 hits, which were about 0.3 less hits than expected or -0.3 HAA.
For the entire 2023 season, Angel Stadium had 5,515 at-bats, 1,341 hits (a .243 average), and -11 HAA. To get the park's adjusted hits, multiply its at-bats by the league average and add its HAA:
5,515 x .248 - 11 = 1,358
Divide the adjusted hits back into the at-bats to get the adjusted batting average: .246 for the Big A, slightly below average, but a little higher than its simple batting average of .243.
Here are the numbers for all 33 ballparks that hosted MLB games in 2023, sorted by adjusted BA:
Now that I have initial estimates of adjusted BA for all 1,890 ballpark-seasons from 1950 through 2023, I can take a first crack at their "true-talent" BA. For that I use the six surrounding years of each ballpark-season -- the three years before and the three years after -- to try to "project" that park's HAA, and then compare the projection to its actual HAA. I actually made two versions of this, one where I use all six surrounding years and one where I pretend not to know what happened in the three years after (which isn't pretending for the 2021-23 seasons). So I'll demonstrate the latter using Angel Stadium 2023 as my example again:
There were 13,082 at-bats and 73 HAA at Angel Stadium in the three years before 2023. Prorate that to the 5,515 at-bats Angel Stadium had in 2023 and I get 31 HAA, which is the projected HAA for Angel Stadium in 2023. The Big A actually had -11 HAA in 2023, an "error" of 42 hits.
Add up all the differences between actual and projected HAA for all 1,848 ballpark-seasons from 1953 to 2023, and I get 49,405. The idea is to lower this number as much as possible, and I can do that by weighting seasons by their proximity to the target year: if I'm projecting 2023, then 2021 gets less weight than 2022, and 2020 gets less weight than 2021. I can also regress to the mean by adding a number of average (0 HAA) at-bats, which provides ballast (and a sanity check to some of those "special guest" parks that only hosted a game or two).
If I'm only looking at the three previous years, I got the best results with a 25% "decay" rate -- meaning the year before the target year is weighted at 75%, two years before is weighted at 75% of that (56%), and three years before is weighted at 75% of that (42%). And I added 2,900 at-bats of regression, which is a huge amount -- more than half a season's worth for most ballparks.
That gets the projected HAA down to 20, the error down to 31, and the total error for all ballpark-seasons (1953-2023) down to 47,362.
There's actually a third version of true-talent BA, where I don't know what happened in the years after OR in the target year; in which case it really is a projection, like if I was writing this in the 2022-23 offseason. Then my work is done: the prorated, weighted HAA (20 for Angel Stadium 2023) is the true-talent HAA. If Angel Stadium has 5,515 at-bats and the MLB average is .248, I multiply those two, add the 20 HAA, divide by the at-bats, and get a projected .252 true-talent batting average, or true BA, for Angel Stadium 2023.
Now, back to the version where I know what happened in the target year. To get true HAA, I add the target year's numbers into the total before prorating:
The "true talent" of Angel Stadium 2023, as best as I can currently determine it, was 9 HAA in 5,515 at-bats, which works out to a true BA of (5,515 x .248 + 9) / 5,515 = .250.
Instead of four seasons of unweighted data from which to project true BA (25% of which is the target year), I now have the equivalent of about 3.27 years: 0.42 for year-3, 0.56 for year-2, 0.75 for year-1, a full year for the target year, and about 0.54 years' worth of regression. That makes the target year about 30% of the projection, the three previous years about 53%, and the remaining 17% regression to the mean. (That's all based on an average ballpark-season of 5,331 at-bats. Obviously, partial seasons (like 2020), or part-time parks, or parks with multiple full-time tenants (like Dodger Stadium 1962-65) changes that math.)
So to recap, if I didn't know what happened in 2023 (but somehow still knew the MLB average would be .248), I would project Angel Stadium to have a .252 true BA. Angel Stadium actually had a .246 (adjusted) BA in 2023. Knowing that, and not knowing what will happen in the 2024-26 seasons, I estimate that Angel Stadium had a .250 true BA in 2023.
Here are the 33 ballparks of 2023 again, now sorted by true BA:
Estadio Alfredo Harp Helú in Mexico City, with only two games played there (and none before last year), has its .342 adjusted BA regressed all the way down to a .253 true BA, which still makes it the 4th-most hit-conducive park in baseball last year, by my estimates.
Great American Ballpark played like a pitcher's park in 2023 after several years in a row of HAA well into the positive. Once the previous seasons are accounted for, GABP jumps nine points from its adjusted BA (.242) to its true BA (.251).
Now for the version where I know what happened in the three years before, the target year, AND the three years after. The parameters are different. Because I have six surrounding years of data for most parks instead of only three, the "decay" rate is a little higher, 35%, which means each individual surrounding season carries less weight compared to the target season. Year-1 and year+1 are weighted at 65%, year-2 and year+2 at 42%, and year-3 and year+3 at 27%. I also need much less regression, only 1,100 at-bats, or about one-fifth of a season for most parks. That gives me the equivalent of about 3.9 seasons' worth of data, 26% of which is the target year, 69% the surrounding years, and 5% regression.
I'll demonstrate using 1998 Coors Field, which was the park's fourth season (and its first hosting the All-Star Game), and in the middle of the seven years of pre-humidor Coors:
Coors Field 1998 had 179 true HAA. The MLB average in '98 was .266, so Coors Field's true BA was (5,793 x .266 + 179) / 5,793 = .297. That's the 6th-highest figure of 1950-2023, behind only the other five Coors Field seasons of 1995-2000.
Now let's back up a couple years and look at 1996, Coors Field's second season, to demonstrate again how true BA changes from one version to another as I get more context. Coors Field '95, its inaugural season, had 5,343 at-bats, 183 HAA, and a .301 adjusted BA (the latter two figures were both the highest since at least 1950 up to that point). If I'm projecting 1996, I weight those numbers at 75% (4007 at-bats and 137 HAA), then add the 2,900 at-bats of regression. That's 137 HAA in 6,907 at-bats, which prorates to 117 HAA in the 5,875 at-bats Coors had in '96. Multiply the 5,875 at-bats by the .270 MLB average in '96 and add the 117 HAA, and I get a .289 projected true BA for Coors Field 1996.
Coors Field actually had 256 HAA and a .313 adjusted BA, both of which are the highest figures of 1950-2023 (except for a few guest parks with higher adjusted BAs, like Estadio Monterrey that same year with a .326 adjusted BA in 220 at-bats). Add the '96 numbers to the weighted '95 and regression totals, and I get 393 HAA in 12,782 at-bats, which prorates to 181 HAA in 5,875 at-bats, and a .301 true BA.
After three more years, I have weighted numbers for 1997 (3,778 AB and 95 HAA), 1998 (2,448 AB and 71 HAA), and 1999 (1,632 AB and 56 HAA), as well as new weights for 1995 (3,473 AB and 119 HAA). In this case it doesn't change the results much; the 1997-99 seasons only confirmed what we already conjectured about Coors Field from the 1995-96 seasons. Add all those to the unweighted numbers for '96, plus 1,100 at-bats of regression, and I get 18,306 AB and 598 HAA, which prorates to 192 HAA and a .302 true BA, both of which are (as you probably guessed by now) the highest figures of 1950-2023. And that's my best estimate (for now) of ballparks' true-talent batting average.
At the other end of the spectrum, Safeco Field 2000 had the fewest true HAA, -87, and Candlestick Park '68 had the lowest true BA, .225.
Next: true-talent batting average for PITCHERS.