A Regression Model For Wine Quality Essay

1405 Words Dec 14th, 2014 null Page
4.2 Multiple Regression

A regression model was run to predict the wine rating from different features (fixed.acidity, volatile.acidity, citric.acid, residual.sugar, chlorides, free.sulfur.dioxide, total.sulfur.dioxide, density, pH, sulfates, alcohol) of wine in R. According to the initial analysis on the distributions of the features, many of those features are right-skewed and thus require log transformation. Forward Selection Algorithm was used to find the best predictive models for wine quality where each feature is added to the model one at a time; at each step, each variable that is not already in the model is tested for inclusion in the model. then the most significant of these variables is added to the model, so as long as it 's P-value is below some pre-set level, which is 0.05 in this case. Backward Selection Algorithm was also used while removing one feature (insignificant) at a time from a model that includes all the features. As a result, Backward Selection Algorithm produced a less ideal final model compared to Forward Selection Algorithm because it checked for less regression models compared to Backward Selection Algorithm. Table2 includes the statistical output of the best model from R:

From the table above, the most influential features are alcohol concentration and sulphates concentration in red wine because these two features have the largest slope coefficients. Therefore the higher the alcohol and sulphates concentration in red wine, the higher the…

Related Documents