• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/29

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

29 Cards in this Set

  • Front
  • Back

What is a "positive example"?

y = 1
Why might you pick a Support Vector Machine algorithm rather than a Neural Network or or Logistic Regression?
Compared to Logistic Regression and Neural Networks a Support Vector Machine sometimes gives a cleaner, and sometimes a more powerful way of learning nonlinear functions.
Write the HYPOTHESIS FUNCTION for logistic regression.
Draw the graph of the HYPOTHESIS FUNCTION for logistic regression.
For Logistic Regression, if y = 1, we want h(x)...
h(x) ~= 1 for (theta)T*x >> 0

(hypothesis outputs ~1 when (theta)Tx >> 0)
For Logistic Regression, if y = 0, we want h(x)...
h(x) ~= 0 for (theta)T*x >> 0

(hypothesis outputs ~0 when (theta)Tx << 0)
Write the SINGLE EXAMPLE COST FUNCTION for logistic regression.
for y=1; cost1(z),
draw the LOGISTIC REGRESSION COST FUNCTION.
for y=0; cost0(z),
draw the LOGISTIC REGRESSION COST FUNCTION.
Draw the cost function for an SVM.
Compare the two previous graphs to the graph of the Cost Function for an SVM.
The graph of an SVM cost function
(1) has two parts; Cost1(z) and Cost0(z)
(2) corresponding to the two parts of the Logistic Regression cost function.
(3) Composed of lines.
(4) The LHS gives h(x)=0 for z>1
(5) The RHS gives h(x)=0 for z<-1
Write the OBJECTIVE FUNCTION for logistic regression.
Write the OPTIMIZATION OBJECTIVE FUNCTION for a Support Vector Machine.
How does the parameterization of an SVM minimization function differ from that used for Logistic Regression?
* SVM's remove the "m" term.
* LR parameterization: A + lambda*B
* SVM parameterization: C*A + B

think of C = 1/lambda
What type of classifier is a Support Vector Machine?
An SVM is a type of "Large Margin Classifier".
How does the hypothesis function for an SVM differ from that for Logistic Regression?
Unlike Logistic Regression, a Support Vector Machine doesn't output a probability.
It makes a prediction directly:
h(x) = 1, if (theta)T*x>= 0
h(x) = 0, otherwise
What will take to make the 1st term in the SVM objective function = 0?
What will take to make the [bracketed term] in the above SVM objective function = 0?
Recalling,
(1) Whenever y(i)=1, theta)T*x(i) >= 1
(2) Whenever y(i)=0, theta)T*x(i) <= -1
...
Answer:
minimize: [C*0] + (1/2)*sum(theta_j^2), for j=1:n
minimize: (1/2)*sum(theta_j^2), for j=1:n
subject to:
(theta)Tx(i) >= 1 if y(i)=1
(theta)Tx(i) <= -1 if y(i)=0
Write the formula for Gaussian similarity function.
TBD
Describe a strategy for selecting landmarks.
TBD
What other functions can be used to evaluate similarity?
TBD
Suppose you train an SVM and find it overfits your training data. What changes to SVM parameters are reasonable next steps?
(1) Decrease C
(2) Increase σ^2
When implementing a Gaussian kernel, what key point must you remember?
Perform feature scaling.
All valid kernels must satisfy what technical condition?
All valid kernels must satisfy "Mercer's Theorem".
If n is small, m is large:
e.g., n = 1-1000, m=50,000+

Create/add more features, then use logistic regression or SVM without a kernel.

If n is small, m is intermediate:
e.g., n = 1-1000, m=10… 10,000

Use SVM with Gaussian kernel.

If n is large (relative to m):
e.g., n≥m, n=10,000, m=10… 1000
Use Logisitic Regression or SVM without a kernel ("linear kernel").
Name a couple of good SVM libraries.
liblinear
libsvm
How do neural networks fit in with the above recommendations?
Neural networks likely to work well for most of the above settings, but may take longer to train.
Suppose you have trained an SVM classifier with a Gaussian kernel, and it learned this decision boundary on the training set.
When you measure the SVM's performance on a cross validation set, it does poorly. Should you try increasing or decreasin...
Suppose you have trained an SVM classifier with a Gaussian kernel, and it learned this decision boundary on the training set.
When you measure the SVM's performance on a cross validation set, it does poorly. Should you try increasing or decreasing C? Increasing or decreasing σ^2?
TBD