Forecasting Batting Averages in MLB

Author: 
Date created: 
2017-11-14
Identifier: 
etd10438
Keywords: 
Batting Average
MLB
Logistic Regression
Big Data
Forecasting
Abstract: 

We consider new baseball data from Statcast which includes launch angle, launch velocity, and hit distance for batted balls in Major League Baseball during the 2015, and 2016 seasons. Using logistic regression, we train two models on 2015 data to get the probability that a player will get a hit on each of their 2015 at-bats. For each player we sum these predictions and divide by their total at bats to predict their 2016 batting average. We then use linear regression, which expresses 2016 actual batting averages as a linear combination of 2016 Statcast predictions and 2016 PECOTA predictions. When using this procedure to obtain 2017 predictions, we find that the combined prediction performs better than PECOTA. This information may be used to make better predictions of batting averages for future seasons.

Document type: 
Graduating extended essay / Research project
Rights: 
This thesis may be printed or downloaded for non-commercial research and scholarly purposes. Copyright remains with the author.
File(s): 
Senior supervisor: 
Timothy Swartz
Jason Loeppky
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.
Statistics: