Using frequency analysis, I checked the minimum, maximum, and range of the categorical variables to identify errors and address the same. Additionally, I examined valid and missing values and investigated cases and variables displaying a large number of missing data. As shown in Table A4, the dataset consisted of 320 cases, of which 218 (68.1%) were associated with the 2004 GSS year and the remaining 102 …show more content…
Through reviewing the minimum, maximum, and range of values I attempted to discern and identify potential coding errors as well as invalid data. The nature and distribution of the values were explored through examining the mean, standard deviation, and variance as well as skewness and kurtosis. For example, the variable Age contained 320 valid cases, with respondents ranging in age from 18 to 35 years. The mean age was 28.02 years with a standard deviation of 4.76 years. Emailtime and Wwwtime consisted of 320 and 317 valid cases, respectively. The mean for Emailtime was 383.99 minutes spent on email per week, whereas Wwwtime had a mean of 568.57 minutes per week. Interestingly, both displayed a large standard deviation, 595.13 minutes and 721.94 minutes, respectively. However, since the majority of respondents spent up to 200 minutes on email and up to 400 minutes on the Internet, with few participants exceeding 3,000 to 4,000 minutes a week, this is somewhat expected. Due to the clustering of scores to the left the skewness of the distribution was positive for both variables (2.70 for Emailtime and 2.72 for Wwwtime). Additionally, the positive values for kurtosis were indicative of high peaks in the distribution of Emailtime and Wwwtime (8.83 and 9.51,