Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
21 Cards in this Set
- Front
- Back
Attribute types |
Nominal Ordinal Numeric |
|
Central tendency |
Mean Median Mode |
|
Linear correlation |
Two attributes are linearly correlated if there exists a strong linear relation between them. Pearson's coefficient - measures the level of linear correlation |
|
Lazy vs. Eager learning |
Lazy = stores training data and waits until given an input Eager = decision trees. Construct a model before classifying |
|
Euclidean distance |
The distance in a space between two points |
|
Hamming distance |
The number of different bits between two codewords |
|
Decision trees |
Split on the attribute with the highest information gain. |
|
Confusion matrix |
What we did. C1 c2 C1 true positive false negative C2 false positive true negative Sensitivity = true positive recognition rate Specificity = true negative recognition rate Precision = measure of exactness Accuracy = recognition rate |
|
Support |
Percentage of transactions in D that contains a transaction with both A and B |
|
Confidence |
The percentage for rule A=>B in D |
|
Closed pattern |
If there exists no proper super set of X that has the same support |
|
Max pattern |
If X is frequent and there exists no superset Y that is frequent |
|
Agglomerative single linkage |
K clusters, 1 pr sample, merge clusters based on Euclidean distance, end when there are k clusters |
|
K-means |
Select number of clusters k Assign to random cluster Calculate centroids, average value in a cluster Euclidean distance, assign to closest centroid |
|
K-medoids |
K clusters K samples as medoids Assign to closest medoid Replace medoids with the sample that minimises the cost, Euclidean distance |
|
GSP |
Apriori for sequences Min-gap = minimum time between last item in one sequence and first item in the next sequence Max-gap = max time between Window-size = max distance between first and last element in a sequence |
|
FSD |
Apriori for graphs Isomorphic graph = two graphs may be equal due to symmetries Canoncial labels = unique code representing a graph and all its isomorphic graphs |
|
Artificial neural network |
Input layer, hidden layer, output layer Feed-forward Training algorithm = evaluate quality of weights (error function), strategy to search for possible solutions For each data sample, each neuron executes: Accumulate error from next layer Calculate error contribution Backpropagate error to each neuron in previous layer Weight updates Online = weight updates after each sample Batch = weight update after all samples Mini-batch = weight updates after several samples |
|
Self-organizing maps |
Two layers: input and output (feed forward) Each neuron contains a weight vector BMU = best matching unit, neuron closest to input Ordering phase = rough Convergence = fine-tuning Quantization error = distance between input and prototype Topographic error = if two BMU to an input are not adjacent |
|
Gaussian neighborhood function |
Calculate degree of neighborhood Should decrees over time. Affect only neurons in the neighborhood |
|
Kohonens update rule |
Weight update based on: Distance in the map (neuron to BMU) Distance in input space (data point to weight) Epoch |