• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/55

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

55 Cards in this Set

  • Front
  • Back
Give two examples that uses Supervised Learning
- Regression
- Classification
Give two examples that uses Unsupervised Learning
- Clustering
- Data Compression
Give two examples that uses Reinforced Learning
- Behavior Selection
- Planning
Give one example that uses Evolutionary Learning
- General Purpose Optimization
Give some examples on where machine learning is used
- Speech recognition
- Image recognition
- Natural language processing
- Autonomous robots
- Spam-filter for e-mail
Where is machine learning useful?
- A pattern exist
- Data available for training
- Hard/impossible to define rules mathematically
Describe Nearest Neighbour
Example
- N1 samples of class C1
- N2 samples of class C2
- Want to classify x- Compute distance to all the N1 + N2 samples
-> Find the nearest neighbour
-> Classify x to the same class

Example


- N1 samples of class C1


- N2 samples of class C2


- Want to classify x- Compute distance to all the N1 + N2 samples


-> Find the nearest neighbour


-> Classify x to the same class

Describe k-NN
- Compute the distance from all samples to the new data x

- Pick the k nearest neighbours to x

-> Majority vote to classy x
Give three pros of k-NN
- Simple (only one parameter k)

- Applicable to multi-class problems

- Good performance, effective in low dimension data
Gives two cons of k-NN
- Costly to compute distance to search for the nearest
- Memory requirement: must store the whole training set
The Shannon information content of an outcome is


What is entropy?
- Measure of uncertainty (unpredictability)
- Measure of uncertainty (unpredictability)

Information Gain

How do you train a Decision Tree

1. Choose the best question (according to information gain [most gain]), and split the input data into subsets.


2. Terminate: call branches with unique class labels leaves (no need for further questions)


3. Grow: Recursively extend other branches (with subsets bearing mixture of labels)

Decision Tree Example: Entropy Coin Toss



Decision Tree Example: Entropy Dice



Decision Tree Example: Entropy Fake Dice



Decision Tree: Give the greedy algorithm to choose a question

Choose the attribute which tells the most about the answer




- In sum we need to find good questions to ask. (more than one attribute could be involved in one question)

What is Gini impurity?

What is overfitting?

- When the learned model is overly specialized for the training samples.


- Good results on training data, but generalizes poor

When does overfitting occur?

- Non-representative samples


- Noisy examples


- Too complex model

Occam's principle (Occam's razor)

"Entities should not be multiplied beyond necessity"




The simplest explanation compatible with data tends to be the right one

How should you use your training data to reduce overfitting?

Separate the available data into two sets


- Training set, T: To form the learning model


- Validation set, V: To evaluate the accuracy of the model




(V needs to be large enough to provide with statistically meaningful instances)

Why should you split up the data set into a training set and a validation set?

- The training may be misled by random errors, but the validation set is unlikely to exhibit the same random fluctuations.


- The validation set provides a safety check against overfitting.

Explain Reduced-Error Pruning

Split the data into training and validation set




Do until further pruning is harmful:


- Evaluate the impact of pruning each possible node (and those below it) with the validation set


- Greedily remove the one that most improves the accuracy of the validation set




Produce the smallest version of the most accurate subtree

When extending the decision tree how do you avoid overfitting?

- Stop growing the tree when the data split is no longer statistically significant.

- Grow full tree then post-prune (e.g. with reduced-error pruning)

Overfitting: Graph



What are some issues that occur in higher dimensions?

- Easy problems in low-dimensions are harder in high-dimensions


-- Training more complex model with limited sample data




- In high dimensions everything is far from everything else.


-- Issues i nearest neighbours.

Mean square error (MSE)




Bias of a classifier



Variance of a classifier

Residual sum of squares (RSS)

The sum of squared errors of prediction

f(x) - the predicted value
y - the actual value

The sum of squared errors of prediction




f(x) - the predicted value


y - the actual value



What is Recursive Binary Splitting and how does it work?

A top-down greedy algorithm that splits the predictor space.




- A split creates two new branches.


- The greedy part is that at each split, the best split is made at that stage, rather than looking ahead and see if another split would lead to a better tree.


- This continues for each new branch created until a stopping criteria is reached.







Linear Regression function



Mean square error

Regression: RSS (residual sum of squares)



What is RANSAC?

It stands for RANdom SAmpling Consensus.


It's an algorithm to find a robust fit of model for a data set that contains outliers

Expain the RANSAC algorithm

Algorithm


(i) Randomly select a sample of s data points from S andinstantiate the model from this subset.


(ii) Determine the set of data points Si which are within a distancethreshold t of the model. The set Si is the consensus set ofsamples and defines the inliers of S.


(iii) If the subset of Si is greater than some threshold T, re-estimatethe model using all the points in Si and terminate


(iv) If the size of Si is less than T, select a new subset and repeatthe above.


(v) After N trials the largest consensus set Si is selected, and themodel is re-estimated using all the points in the subset Si.

k-NN Regression (non-parametric)




Example: k-NN regression



Give two examples of regression + regularization

- Ridge Regression


- The Lasso

Ridge Regression



The Lasso (Least absolute shrinkage and selection operator)



What is Heuristics?

Experience-based techniques for problem solving, learning, anddiscovery that give a solution which is not guaranteed to beoptimal (Wikipedia)

Naive Bayes: What is the function of MAP estimate of y.

It returns the y with the highest probability

It returns the y with the highest probability

Bayes Rule names:

Bayes Rule:

What does it mean!?

What does it mean!?

A and B are independent from each other

A and B are independent from each other

Bayes Rule for ML


What is Bayesian learning?

What is Bayes Inference?



When is Maximum Likelihood Estimate useful?

Useful if we do not know prior distribution or if it is uniform.

Why can't the Maximal Margin Classifier be used on most data sets?

Since it requires the classes to be separable by a linear boundary

Give two examples of common kernels and their functions

Support vector machine function

Where S is the set of support vectors and K is the kernel function

Where S is the set of support vectors and K is the kernel function