In this model, I want to analyze what variables had the largest impact on a pitcher's ERA. Specifically, I am interested in testing which individual statistics play the biggest role in a …show more content…
I imported data for 30 starting pitchers in the year 2016, all pitchers pitched in the MLB in the year 2016. Next, I decided what independent variables I wanted to include in my multiple regression. I included strikeouts per nine innings (K9), walks per 9 innings (W9), if the pitcher through a complete game (CG), hits per nine innings (H9), and WHIP. WHIP is a statistical measure to determine how many runners get on base versus a pitcher, a lower WHIP is good. The variable complete games, is being used as a dummy variable, either a pitcher pitched a complete game in the season or they did not. I ran a multiple regression on ERA using the five independent variables listed