• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/40

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

40 Cards in this Set

  • Front
  • Back

What are the 6 components of prediction?

Question


Input


Feature


Algorithm


Parameters


Evaluation

What are the properties of good features?

lead to data compression


Retain relevant information


Are created based on expert application knowledge

What are common mistakes with features?

Trying to automate feature selection


not paying attention to data-specific quirks


Throwing away information unnecessarily

What criteria is used to evaluate machine learning methods?

Interpretable


Simple


Accurate


Fast


Scalable

What is sample error?

The error rate you get on the same data set you used to build your predictor

What is out of sample error?

The error rate you get on new data set.

What is another term for out of sample error?

Generalization error

What is more important, sample error or out of sample error?

Out of sample error

What is the reason for a larger out of sample error compared to in sample error?

Overfitting

Why does a machine learning method overfit?

The algorithm captures both the signal and the noise

What are the 6 steps in predictive study design?

Define your error rate


Split data into training, testing, validation


Pick features


Pick method


Apply method to test data and refine


Apply method to validation data

How should training, test and validation data sets be separated?

Randomly

What are the different types of classification errors?

True positive


False positive


True negative


False negative

What is the formula for sensitivity?

True Positive / (True Positive + False Negative)

What is the formula for specificity?

True negative / (False Positive + True negative)

How do you plot ROC curves?

Y axis: True Positive


X axis: False positive

What does ROC stand for?

Receiver operating characteristic

How do you evaluate an ROC curve?

The more area under the curve the better

What is the main principal of cross validation?

Train the model repeatedly with only a subset of the training data


Use the excluded data to evaluate the models

What are 4 use cases of cross validation?

Picking variables to include in a model


Picking the type of prediction function to use


Picking the parameters in the prediction function


Comparing different predictors

What are 2 different cross validation methods?

K-folds


Leave one out

Caret method to preprocess data?

preProcess

What are 4 useful caret functions to partition data?

createDataPartition


createResample


createTimeSlices


createFolds

What caret method is used to train a model?

train

What caret method is used to make predictions using a model and data?

predict

What caret function is useful for comparing predicted results with actual results?

confusionMatrix

What property on a caret model give the results of the model?

finalModel

What are 2 metrics that can be used to evaluate continues models?

Root mean squared error(RMSE)


RSquared

What are 2 metrics that can be used to evaluate categorical models?

Accuracy(fraction correct)


Kappa(measure of concordance)

What caret method creates a plot of all features against an outcome?

featurePlot

What function breaks a group of data into quantiles based on number of bins?

cut2(Hmisc package)

What attributes can be set on the preProcess method to standardize features on a data set?

method=c("center", "scale")

What transformation tries to make continuous data look like normal data?

Box Cox

What plot shows sample quantiles vs theoretical quantiles?

Normal Q-Q plot

What algorithm is useful for imputing data?

k nearest neighbors?

What method must be set on preProcess to impute the data using k nearest neighbors?

knnImpute

What are the two levels of covariate creation?

level 1: From raw data to covariate


level 2: Transforming tidy covariates

What must be considered what converting raw data into covariates?

Summarization vs information loss

What must categorical variables be for a ML algorithm to work?

dummy variables (0 or 1)

What caret function converts categorical variables into dummy variables?

dummVars