Essay on Professor Proposes

Case: Professor Propose
Team Members: Eung Sung Min(2015-27114), Peter(Jong-Hyub) Park(2015-27117), Sung Eun Park(2015-27116), Ki Peum Park(2015-27115)
Questions 1. Which factor is the most important factor in deciding price? * Based on the equation (without wholesaler variable), the coefficient for carat is 3873.15, which is the highest coefficient among the variables. Therefore, we can conclude that the carat size is the most influential factor on the price. The increase of carat size would increase the price of 3,873 USD. (t-value of carat is approximately 64 and p-value of carat is <0.0001, signifying that the carat size is a significant factor). 2. Is the price of the diamond
Therefore, we can claim that the regression model might not be applicable in predicting the price of 4 carat diamond. In addition, although carat is the most important factor in deciding the price of the ring, there are other factors which are influential to the price of the ring. Consequently, dealing with such a high number of carats that is not in the dataset would cause a problem. Outside of this range, there is no guarantee that the relationship within the range of the sample is valid. 4. What’s the strengths and weaknesses when we treat ‘Clarity’ variable between 1 and 12 (direct as in data set) or a binary variable (0 or 1 with a threshold level. e.g. clarity 1-5 as 0 and 6-12 as 1)? * When we treat ‘Clarity’ variable as a scale between 1 and 12, we get the simple number of the coefficient, 237.6. (used the regression equation without Wholesaler variable). This means that the price of the ring tends to increase by 237.6 for each one-unit increase in “Clarity” variable.
However, when we use the “Clarity” index as a binary variable, we do not assume the linearity and equal spacing between each level. In fact, the interval for higher clarity group (VS1 to VVS1) is smaller in comparison to other variables. 5. This Diamond data set is composed of those of diamonds around 0.2 Carat and around 1-1.2 Carat. If we run the regression for the two clusters separately and compare the results of the

