Essay on Analyzing The Concerns Expressed By This Data Analyst

758 Words 4 Pages
1. Summarize the concerns expressed by this data analyst. Data mining in the real world is a lot different from the way it’s described in textbooks for many reasons. First of all, data are always dirty with missing values, values way of the range of possibility, and time value that make no sense. In addition, missing values are problematic because missing values make data dirty and it’s not possible to use dirty data which means missing values decrease data. Sometimes it’s possible to do a much better day analysis by adding more data, but due to non-availability of data and limitation of the time and the budget for reprocessing data will make it hard and impossible. Another problem is overfitting, which is a huge problem because a model that has been overfit will reduce the predictive performance of the model. Furthermore, data mining is about probabilities, not certainty so even with high probability, date can be wrong due to bad luck which makes others to think the model is wrong. Another problem for the data mining is seasonality because data changes in every season so if a model created in a particular season then it can only use in that season.

2. Do you think the concerns raised here are sufficient to avoid data mining projects altogether? All concerns raised here are true about data mining but it is not sufficient to avoid data mining projects altogether. Data mining is about collecting data and use all collected data to predict what is going to happen next. It is…

Related Documents