Set the Language

We weren't able to detect the audio language on your flashcards. Please select the correct language below.

Front

Back

Flashcards
»
ml7641-midterm

Ml7641-Midterm

by venushankar, Oct. 2016

Subjects: ML

Favorite

Add to folder

Flag

Related Essays

Descriptive Statistics
Outliers are extreme values in our data set. Outliers can be unusually minimum or maximum value while compared with the rest of the data
Outlier Detection
Outlying data points are generally found through two different approaches either being a distance (Knorr & Ng, 1998; Knorr & Ng, 1999; Ramaswamy et al, 2000)...
Malcolm Gladwell's 'The Outliers'
"The Outliers" showed me
Descriptive Analysis Of Rr % And Gr
The variation unexplained by the fitted model can help in decision making. In Table 2 below, the observation is that there are no outliers as none of the sta...
Light Filtration And Photosynthesis Essay
420 seconds is an outlier, as it did not fit within the line of best fit. There were two values of 420 seconds but, as suggested by figure 3, the value furt...
Bayesian Methods In Spatial Statistics
This uncertainty includes the quantities resulting from lack of sufficient information. Another important concept in Bayesian methods are the need to determi...
Macbeth Outliers Quote Analysis
This quote from the play Macbeth speaks to the question from before. The piece from the story “Outliers” shows the idea of
How Does Gibberellic Acid Affect Germination
Random errors can be reduced by having a larger sample
The Other Wes Moore's The Other Wes Moore
Their environment, family, social status are just a few examples. An outlier’s path in life could be predetermined from what they could not control. Wes Moor...
Hyundai Automobile Case Study
Measures the average square of the error. The error is the amount by which the estimate differs from the quantity to be estimated. So looking at what we ha...

Shuffle
Toggle On

Toggle Off
Alphabetize
Toggle On

Toggle Off
Front First
Toggle On

Toggle Off
Both Sides
Toggle On

Toggle Off
Read
Toggle On

Toggle Off

Reading...

Front

Card Range To Study

through

Play button

Progress

1/23

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

23 Cards in this Set

Front
Back

	A weak learner with less than 50% accuracy does not present any problem to the AdaBoost algorithm.	False. If the error > 0.5, then the weight assigned to the mis-classified points will be smaller than the weight assigned to the points that are classified correctly. Thus, subsequent iterations will not try to classify these points correctly and thus the algorithm is likely to exhibit poor performance.
	AdaBoost is not susceptible to outliers. If your answer is true ,explain why. If your answer is false, then describe a simple heuristic to fix AdaBoost so that it is not susceptible to outliers.	False. AdaBoost is susceptible to outliers. A possible heuristic is to put a threshold on the weight and remove all points that have very large weights (typically outliers may have large weights because they will be consistently mis-classified).
	As we increase k, the k nearest neighbor algorithm begins to overfit the training dataset. True or False.	False. In fact, as we increase k, the algorithm under-fits because the decision surface becomes simpler. (for example, when k = #points, the induced function is just the majority class function).
	What is the time complexity of the K-means algorithm. Use the O-notation.Explain your notation clearly.	The time complexity is O(nkdi), where n is the number of data points, d is the number of features, k is the number of clusters and i is the number of iterations needed until convergence.
	Compare decision trees with kNN; what do they have in common; what are the main differences between the two approaches?	Commonality: Decision trees and kNN are both supervised learning methods that assign a class to an object based on the features. Main Differences: Decision trees define a hierarchy of rules in the form of trees and these rules are formed from training data. These trees give priority to more informative features. A decision tree could become very complex as the decision tree gets larger and requires pruning to improve its performance.
	A lot of decision making systems use Bayes’ theorem relying on conditional independence assumptions—what are those assumptions exactly? Why are they made? What is the problem with making those assumptions?	Every feature Fi is conditionally independent of every other feature Fj for i ≠j for a given class. It assumes the presence of a particular feature of a class is unrelated to the presence of any other feature. The assumption is made to simplify decision making by simplifying computations and by dramatically reducing knowledge acquisition cost. The problem of making those assumptions is that features are often correlated, and making this assumption in presence of correlation leads to errors in probability computations, and ultimately to making the wrong decision.
	What does bias measure; what does variance measure? Assume we have a model with a high bias and a low variance—what does this mean?	Bias: measure the error between the estimator’s expected parameter and the real parameter.Variance: measures how much the estimator fluctuates around the expected value. A high bias and a low variance model is a simple model that is underfitting to the dataset
	Maximum likelihood, MAP, and the Bayesian approach all measure parameters of models.What are the main differences between the 3 approaches?	Maximum likelihood estimates the parameter by estimating the distribution that most likely resulted in the data. MAP and Bayesian approach both take into account the prior density of the parameter. MAP replaces the whole density with a single point to get rid of the evaluation of the integral, whereas the Bayesian approach uses an approximation method to evaluate the full integral.
	What is the biggest advantage of decision trees when compared to logistic regression classifiers?	Decision trees do not assume independence of the input features and can thus encode complicated formulas related to relationship between these variables whereas logistic regression treats each feature independently.
	What is the biggest weakness of decision trees compared to logistic regression classifiers?	Decision trees are more likely to overfit the data since they can split on many different combination of features whereas in logistic regression we associate only one parameter with each feature.
	Briefly describe the difference between a maximum likelihood hypothesis and a maximum a posteriori hypothesis.	ML: maximize the data likelihood given the model, i.e., argmaxP(Data\|W) MAP: argmaxP(W\|Data)W
	The error of a hypothesis measured over its training set provides a pessimistically biased estimate of the true error of the hypothesis.	False. The training error is optimistically biased since it’s biased while usually smaller than the true error.
	if you are given m data points, and use half for training and half for testing, the difference between training error and test error decreases as m increases.	True. As we have more and more data, training error increases and testing error de-creases. And they all converge to the true error.
	Overfitting is more likely when the set of training data is small	True. With small training dataset, it’s easier to find a hypothesis to fit the training data exactly i.e., overfit.
	Overfitting is more likely when the hypothesis space is small	False. We can see this from the bias-variance trade-off. When hypothesis space is small, it’s more biased with less variance. So with a small hypothesis space, it’s less likely to find a hypothesis to fit the data very well i.e., overfit.
	Are outliers always bad and we should always ignore them? Why? (Give one short reason for ignoring outliers, and one short reason against.)	Outliers are often “bad” data, caused by faulty sensors or errors entering values; in such cases, the outliers are not part of the function we want to learn and should be ignore. On the other hand, an outlier could be just an unlikely sample from the true distribution of the function of interest; in these cases, the data point is just another sample and should not be ignored.
	Declare or compute the VC dimension of the following classifiers. A K-nearest neighbor classifier with K = 1.	When K = 1 a 1NN can correctly classify all data points, hence theVC dimension is infinity.
	Declare or compute the VC dimension of the following classifiers. A single-layer perceptron classifier.	A perceptron is a linear classifier and hence the VC dimension is D+1.
	Declare or compute the VC dimension of the following classifiers. Assume input space ,D = 2. A square that assigns points within as one class and points outside as another class. Draw a scenario where this classifier shatters all points for the VC dimension you have proposed.	The VC dimension is 3. Draw three points in 2D in a standard tripod structure and the square is able to shatter all labeling configurations.Note that a square can’t cope with 4 points regardless of how they are placed. A rectangle can shatter 4 points if they are structured in a diamond-like shape.
	When the data is not completely linearly separable, the linear SVM without slack variables returns w = 0.	False, there is no solution.
	Assume we are using the primal non linearly separable version of the SVM optimization target function. What do we need to do to guarantee that the resulting model is linearly separable?	Set C =∞
	After training a SVM, we can discard all examples which are not support vectors and can still classify new examples.	True
	Increasing the number of layers always decrease the classification error of test data.	False

Share This Flashcard Set

Set the Language

Ml7641-Midterm

Add to Folders

Upgrade to Cram Premium

Related Essays

Card Range To Study

23 Cards in this Set