The mean is a measure of central tendency that obtained by dividing the sum of observed values by the number of observations, n. Data points can fall above, below, or even on the mean, it is widely considered a good estimate for predicting subsequent data points. . For the retention rate (RR %), the mean is 57.41 while it is 41.76 for GR%, the dependent variable.
The minimum and maximum are basically the least and highest observed value. For the retention rate (RR %) it is 4 and 100 respectively, while it is 25 and 61 for the dependent variable (GR %). Both can be used to identify any possible outliers or a data entry error. By comparing minimum and maximum, one can assess the spread of the data.
The …show more content…
Scatter Diagram
In Figure 1, the straight line fits through the data plotted below showing that a positive linear relationship exists. The small scatter around the line shows the strong linear correlation as well as positive slope. Figure 1: Scatter diagram showing strong positive linear relationship
3. The estimated linear equation is Yi = b0 + b1X1 + e1, where Y is the estimated dependent value for the observation, X is the estimated independent variable for the observation, e is the random error, B0 is the estimate of the regression intercept and B1 the estimate of the regression slope coefficient. Therefore: GR = B0 + B1Rr1 + e1
4. The regression equation is therefore derived as: Graduation rate (%) = 25.423 + 0.2845 (Retention rate %), that will also be regression line fitted to the given data. The equation shows that the coefficient for retention rate in percentage is 0.2845. It indicates that for every additional percentage in retention rate (independent variable X), the model predicts an increase in graduation rate by 0.2845. In order words, for every 10% point increase in RR, GR increases by an average of …show more content…
R2 is a measure of goodness of fit. It shows how much the behaviour of Y (GR %) is explained by the behaviour of X (RR %). A value of 0.449 may be acceptable depending on the data been analysed but quite low in analysis of these observations. From the summary output R-squared is 0.449 or it is 44.9%, then it can be said that 44.9% of the variance is explained by the model. The result of a R² = 0.449 means that the best fit equation for the data shows a correlation lesser of 50%. It therefore indicates a somewhat "weak" relationship hence not a good fit. Figure 5: Good fit: Residual plots showing a random pattern
A key part of statistical modelling is examining residuals. By carefully looking at the residuals, assumptions become either reasonable or unreasonable which in turn determines whether the model in itself is appropriate. Residuals are the differences between the observed and predicted values. The variation unexplained by the fitted model can help in decision making. In Table 2 below, the observation is that there are no outliers as none of the standardized residuals (in absolute value) is greater than +3 or less than -3.
Observation Predicted GR (%) Residuals Standardized Residuals Observed GR %
WIU 27.41458565 -2.414585651 -0.329782615 25
South University 39.93372978 -14.93372978 -2.039639575