These complications all stem from the same issue: data. A machine can only teach itself when provided with data to learn from. So, if the machine is not provided enough data or is provided biased data it will impact the accuracy of the algorithm. There is no set standard on how much data must be gathered for a machine learning algorithm because it varies with the complexity of the problem at hand and the algorithm itself but in many cases the more data you use more accurate the system will be. However, this is not the case if the data provided is biased. This brings us back to the quality of the data. Data must be unbiased but also have enough variance to have an accurate result. Once we have collected enough quality data we can work to prepare our data. Depending on the type of model we are using we may have to do different things to prepare our data. These things may include: randomizing the order the machine interprets the data, de-duping, normalization, error correction and more. Improving data collection and integration will directly influence the overall quality of the …show more content…
Before we get into the specifics of the types of algorithms we must understand how all machine learning algorithms work. All algorithms have 3 general components representation of knowledge, training, and evaluation of the possible program, and optimization of the program. Picking different representations of knowledge can determine the rules, structure, and model that your system is going to follow. Once we have chosen our representation we begin to train the program using the data we have collected. Some representations of the program are more effective than others for different problems and implementation which is why we must evaluate our trained program. After training, we determine if a change is required or if we are pleased with the results. After we evaluate the program we move onto optimizing it. Here we would use the information gathered from the evaluation stage to either tune the parameters or begin to use our