• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/43

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

43 Cards in this Set

  • Front
  • Back

Hypothesis space for a decision tree

All possible decision trees

Hypothesis space for a learner

All possible outcomes for the specific learner

How do you construct a decision tree?

Ask questions about status/target/attributes sequentially

What is the definition of entropy?

Sum[i](


-p_i*log2(p_i)


)


Wher p_i is the probability of event i

What is overfitting?

Overfitting is when the learned models are overly specialized for the training samples. This leads to poor generalization

Name two reasons for overfitting

- Non-representative sample


- Noisy examples


- Too complex model

What is Occam's razor?

"The simplest solution is often the correct one"

How do you prevent overfitting?

- Seperate available data into two sets; one for training and one for validation.



What is "bagging"?

Bootstrap aggregration for decision trees.

Which error is produced by bias?

The difference between the average (expected) prediction of our model and the correct value.

Which error is produced by variance?

The variability of a model prediction for a given data point between different realizations of the model.

What is linear regression?

Linear regression tries to estimate a function f which predict the output of the model

What is RANSAC an abbreviation for?

RANdom SAmpling Consensus

Describe the RANSAC algorithm

Using a randomly selected set S, determine which points in S are within a given distance to the model. If the number of point in S which satisfy the distance criteria are greater than some threshold, re-estimate the model using the points in S.




Repeat the above N times and select the largest set S, consensus set, and re-estimate using this S.

What is the difference between Ridge Regression and Least Squares?

Ridge Regression uses a shrinkage penalty factor.

Which features does Ridge Regression include?

All features are included when using Ridge Regression

What does the acronym 'The Lasso' stand for?

Least Absolute Shrinkage and Selection Operator

What is a mathematical benefit of using Lasso over Ridge?

Some of lasso's coefficients will be exaclty zero

What is a discrete value?

A discrete value is a value from a predefined set

How can we tell that two events are independent of each other?

P(A|B) = P(A)

Which type of value requires classification, discrete or continous?

Discrete-value problems use classification.

Which type of value requires regression, discrete or continous?

Continous-value problems use regression.

On which assumption is the Naive Bayes Classifier based?

That all events are independent

What is the basic premise of an artificial neuron?

Using several inputs, construct a value representing all the inputs, compare against a threshold and return a +/- answer (usually).

When does Perceptron Learning converge?

Always, if the problem is solvable.

Using Perceptron Learning, when does weight change?

When the output is wrong.

What is another name for Delta Rule?

LMS-rule

Using the Delta Rule, when do weights change?

Always, the separating plane is always nudged a little.

When does LMS-rule converge?

Only in the mean

When does the Delta Rule converge?

Only in the mean

What is an advantage of using LMS over Perceptron?

LMS will find an optimal solution even if the problem can't be fully solved.

When using hyperplanes, one faces certain structural risks. What is one counter-measure?

The use of margins, which allows for some buffer surrounding the hyperplane.

Which problems might occur when scattering low-dimension data into higher dimensions?

1. Many free parameters -> bad generalization
2. Extensive computation

What is the main purpose of the kernel function when working with SVM's?

To transform low-dimensional data into high-dimensional data, but only by using scalar products of the low-dimensional values.

Name two common types of kernel functions

Polynomial kernels




Radial base kernels

What is the point of having multiple layers in Artificial Neurons?

A layered network can create arbitrary decision surfaces, i.e. non-linear.

Are multi-layered ANN's continuous or discrete in values?

The threshold functions output is continuous and the input signal may be of varying character.

What is the difference between the output of single layered and multi layer Artificial Neurons?

The first has discrete value output, and the second continuous.

What is the difference between Decision Forest and Bagging?

Decision Forest, also known as Random Forest, is a combination of Bagging and a random feature selection.

What are ensemble methods?

Ensemble methods combine weak learners to harness their combined strengths

What is the general idea behind the ensemble method Boosting?

Combine multiple hypotheses

When using BackProp in an ANN, what is a hypotheses?

A set of weights for all the connections