Study your flashcards anywhere!

Download the official Cram app for free >

  • Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

How to study your flashcards.

Right/Left arrow keys: Navigate between flashcards.right arrow keyleft arrow key

Up/Down arrow keys: Flip the card between the front and back.down keyup key

H key: Show hint (3rd side).h key

A key: Read text to speech.a key

image

Play button

image

Play button

image

Progress

1/21

Click to flip

21 Cards in this Set

  • Front
  • Back

Attribute types

Nominal


Ordinal


Numeric

Central tendency

Mean


Median


Mode

Linear correlation

Two attributes are linearly correlated if there exists a strong linear relation between them.


Pearson's coefficient - measures the level of linear correlation

Lazy vs. Eager learning

Lazy = stores training data and waits until given an input


Eager = decision trees. Construct a model before classifying

Euclidean distance

The distance in a space between two points

Hamming distance

The number of different bits between two codewords

Decision trees

Split on the attribute with the highest information gain.

Confusion matrix

What we did.


C1 c2


C1 true positive false negative


C2 false positive true negative



Sensitivity = true positive recognition rate


Specificity = true negative recognition rate


Precision = measure of exactness


Accuracy = recognition rate

Support

Percentage of transactions in D that contains a transaction with both A and B

Confidence

The percentage for rule A=>B in D

Closed pattern

If there exists no proper super set of X that has the same support

Max pattern

If X is frequent and there exists no superset Y that is frequent

Agglomerative single linkage

K clusters, 1 pr sample, merge clusters based on Euclidean distance, end when there are k clusters

K-means

Select number of clusters k


Assign to random cluster


Calculate centroids, average value in a cluster


Euclidean distance, assign to closest centroid

K-medoids

K clusters


K samples as medoids


Assign to closest medoid


Replace medoids with the sample that minimises the cost, Euclidean distance

GSP

Apriori for sequences


Min-gap = minimum time between last item in one sequence and first item in the next sequence


Max-gap = max time between


Window-size = max distance between first and last element in a sequence

FSD

Apriori for graphs


Isomorphic graph = two graphs may be equal due to symmetries


Canoncial labels = unique code representing a graph and all its isomorphic graphs

Artificial neural network

Input layer, hidden layer, output layer


Feed-forward


Training algorithm = evaluate quality of weights (error function), strategy to search for possible solutions



For each data sample, each neuron executes:


Accumulate error from next layer


Calculate error contribution


Backpropagate error to each neuron in previous layer


Weight updates



Online = weight updates after each sample


Batch = weight update after all samples


Mini-batch = weight updates after several samples

Self-organizing maps

Two layers: input and output (feed forward)


Each neuron contains a weight vector


BMU = best matching unit, neuron closest to input


Ordering phase = rough


Convergence = fine-tuning



Quantization error = distance between input and prototype


Topographic error = if two BMU to an input are not adjacent

Gaussian neighborhood function

Calculate degree of neighborhood


Should decrees over time.


Affect only neurons in the neighborhood

Kohonens update rule

Weight update based on:


Distance in the map (neuron to BMU)


Distance in input space (data point to weight)


Epoch