Study your flashcards anywhere!

Download the official Cram app for free >

  • Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off

How to study your flashcards.

Right/Left arrow keys: Navigate between flashcards.right arrow keyleft arrow key

Up/Down arrow keys: Flip the card between the front and back.down keyup key

H key: Show hint (3rd side).h key

A key: Read text to speech.a key


Play button


Play button




Click to flip

21 Cards in this Set

  • Front
  • Back

Attribute types




Central tendency




Linear correlation

Two attributes are linearly correlated if there exists a strong linear relation between them.

Pearson's coefficient - measures the level of linear correlation

Lazy vs. Eager learning

Lazy = stores training data and waits until given an input

Eager = decision trees. Construct a model before classifying

Euclidean distance

The distance in a space between two points

Hamming distance

The number of different bits between two codewords

Decision trees

Split on the attribute with the highest information gain.

Confusion matrix

What we did.

C1 c2

C1 true positive false negative

C2 false positive true negative

Sensitivity = true positive recognition rate

Specificity = true negative recognition rate

Precision = measure of exactness

Accuracy = recognition rate


Percentage of transactions in D that contains a transaction with both A and B


The percentage for rule A=>B in D

Closed pattern

If there exists no proper super set of X that has the same support

Max pattern

If X is frequent and there exists no superset Y that is frequent

Agglomerative single linkage

K clusters, 1 pr sample, merge clusters based on Euclidean distance, end when there are k clusters


Select number of clusters k

Assign to random cluster

Calculate centroids, average value in a cluster

Euclidean distance, assign to closest centroid


K clusters

K samples as medoids

Assign to closest medoid

Replace medoids with the sample that minimises the cost, Euclidean distance


Apriori for sequences

Min-gap = minimum time between last item in one sequence and first item in the next sequence

Max-gap = max time between

Window-size = max distance between first and last element in a sequence


Apriori for graphs

Isomorphic graph = two graphs may be equal due to symmetries

Canoncial labels = unique code representing a graph and all its isomorphic graphs

Artificial neural network

Input layer, hidden layer, output layer


Training algorithm = evaluate quality of weights (error function), strategy to search for possible solutions

For each data sample, each neuron executes:

Accumulate error from next layer

Calculate error contribution

Backpropagate error to each neuron in previous layer

Weight updates

Online = weight updates after each sample

Batch = weight update after all samples

Mini-batch = weight updates after several samples

Self-organizing maps

Two layers: input and output (feed forward)

Each neuron contains a weight vector

BMU = best matching unit, neuron closest to input

Ordering phase = rough

Convergence = fine-tuning

Quantization error = distance between input and prototype

Topographic error = if two BMU to an input are not adjacent

Gaussian neighborhood function

Calculate degree of neighborhood

Should decrees over time.

Affect only neurons in the neighborhood

Kohonens update rule

Weight update based on:

Distance in the map (neuron to BMU)

Distance in input space (data point to weight)