Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
29 Cards in this Set
- Front
- Back
What is a "positive example"? |
y = 1
|
|
Why might you pick a Support Vector Machine algorithm rather than a Neural Network or or Logistic Regression?
|
Compared to Logistic Regression and Neural Networks a Support Vector Machine sometimes gives a cleaner, and sometimes a more powerful way of learning nonlinear functions.
|
|
Write the HYPOTHESIS FUNCTION for logistic regression.
|
|
|
Draw the graph of the HYPOTHESIS FUNCTION for logistic regression.
|
|
|
For Logistic Regression, if y = 1, we want h(x)...
|
h(x) ~= 1 for (theta)T*x >> 0
(hypothesis outputs ~1 when (theta)Tx >> 0) |
|
For Logistic Regression, if y = 0, we want h(x)...
|
h(x) ~= 0 for (theta)T*x >> 0
(hypothesis outputs ~0 when (theta)Tx << 0) |
|
Write the SINGLE EXAMPLE COST FUNCTION for logistic regression.
|
|
|
for y=1; cost1(z),
draw the LOGISTIC REGRESSION COST FUNCTION. |
|
|
for y=0; cost0(z),
draw the LOGISTIC REGRESSION COST FUNCTION. |
|
|
Draw the cost function for an SVM.
|
|
|
Compare the two previous graphs to the graph of the Cost Function for an SVM.
|
The graph of an SVM cost function
(1) has two parts; Cost1(z) and Cost0(z) (2) corresponding to the two parts of the Logistic Regression cost function. (3) Composed of lines. (4) The LHS gives h(x)=0 for z>1 (5) The RHS gives h(x)=0 for z<-1 |
|
Write the OBJECTIVE FUNCTION for logistic regression.
|
|
|
Write the OPTIMIZATION OBJECTIVE FUNCTION for a Support Vector Machine.
|
|
|
How does the parameterization of an SVM minimization function differ from that used for Logistic Regression?
|
* SVM's remove the "m" term.
* LR parameterization: A + lambda*B * SVM parameterization: C*A + B think of C = 1/lambda |
|
What type of classifier is a Support Vector Machine?
|
An SVM is a type of "Large Margin Classifier".
|
|
How does the hypothesis function for an SVM differ from that for Logistic Regression?
|
Unlike Logistic Regression, a Support Vector Machine doesn't output a probability.
It makes a prediction directly: h(x) = 1, if (theta)T*x>= 0 h(x) = 0, otherwise |
|
What will take to make the [bracketed term] in the above SVM objective function = 0?
|
Recalling,
(1) Whenever y(i)=1, theta)T*x(i) >= 1 (2) Whenever y(i)=0, theta)T*x(i) <= -1 ... Answer: minimize: [C*0] + (1/2)*sum(theta_j^2), for j=1:n minimize: (1/2)*sum(theta_j^2), for j=1:n subject to: (theta)Tx(i) >= 1 if y(i)=1 (theta)Tx(i) <= -1 if y(i)=0 |
|
Write the formula for Gaussian similarity function.
|
TBD
|
|
Describe a strategy for selecting landmarks.
|
TBD
|
|
What other functions can be used to evaluate similarity?
|
TBD
|
|
Suppose you train an SVM and find it overfits your training data. What changes to SVM parameters are reasonable next steps?
|
(1) Decrease C
(2) Increase σ^2 |
|
When implementing a Gaussian kernel, what key point must you remember?
|
Perform feature scaling.
|
|
All valid kernels must satisfy what technical condition?
|
All valid kernels must satisfy "Mercer's Theorem".
|
|
If n is small, m is large:
e.g., n = 1-1000, m=50,000+ |
Create/add more features, then use logistic regression or SVM without a kernel. |
|
If n is small, m is intermediate:
e.g., n = 1-1000, m=10… 10,000 |
Use SVM with Gaussian kernel. |
|
If n is large (relative to m):
e.g., n≥m, n=10,000, m=10… 1000 |
Use Logisitic Regression or SVM without a kernel ("linear kernel").
|
|
Name a couple of good SVM libraries.
|
liblinear
libsvm |
|
How do neural networks fit in with the above recommendations?
|
Neural networks likely to work well for most of the above settings, but may take longer to train.
|
|
Suppose you have trained an SVM classifier with a Gaussian kernel, and it learned this decision boundary on the training set.
When you measure the SVM's performance on a cross validation set, it does poorly. Should you try increasing or decreasing C? Increasing or decreasing σ^2? |
TBD
|