Set the Language

We weren't able to detect the audio language on your flashcards. Please select the correct language below.

Front

Back

Related Flashcards

Flashcards
»
data mining

Data Mining

by umagaar, Jun. 2015

Favorite

Add to folder

Flag

Shuffle
Toggle On

Toggle Off
Alphabetize
Toggle On

Toggle Off
Front First
Toggle On

Toggle Off
Both Sides
Toggle On

Toggle Off
Read
Toggle On

Toggle Off

Reading...

Front

Card Range To Study

through

Play button

Progress

1/21

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

21 Cards in this Set

Front
Back

	Attribute types	Nominal Ordinal Numeric
	Central tendency	Mean Median Mode
	Linear correlation	Two attributes are linearly correlated if there exists a strong linear relation between them. Pearson's coefficient - measures the level of linear correlation
	Lazy vs. Eager learning	Lazy = stores training data and waits until given an input Eager = decision trees. Construct a model before classifying
	Euclidean distance	The distance in a space between two points
	Hamming distance	The number of different bits between two codewords
	Decision trees	Split on the attribute with the highest information gain.
	Confusion matrix	What we did. C1 c2 C1 true positive false negative C2 false positive true negative Sensitivity = true positive recognition rate Specificity = true negative recognition rate Precision = measure of exactness Accuracy = recognition rate
	Support	Percentage of transactions in D that contains a transaction with both A and B
	Confidence	The percentage for rule A=>B in D
	Closed pattern	If there exists no proper super set of X that has the same support
	Max pattern	If X is frequent and there exists no superset Y that is frequent
	Agglomerative single linkage	K clusters, 1 pr sample, merge clusters based on Euclidean distance, end when there are k clusters
	K-means	Select number of clusters k Assign to random cluster Calculate centroids, average value in a cluster Euclidean distance, assign to closest centroid
	K-medoids	K clusters K samples as medoids Assign to closest medoid Replace medoids with the sample that minimises the cost, Euclidean distance
	GSP	Apriori for sequences Min-gap = minimum time between last item in one sequence and first item in the next sequence Max-gap = max time between Window-size = max distance between first and last element in a sequence
	FSD	Apriori for graphs Isomorphic graph = two graphs may be equal due to symmetries Canoncial labels = unique code representing a graph and all its isomorphic graphs
	Artificial neural network	Input layer, hidden layer, output layer Feed-forward Training algorithm = evaluate quality of weights (error function), strategy to search for possible solutions For each data sample, each neuron executes: Accumulate error from next layer Calculate error contribution Backpropagate error to each neuron in previous layer Weight updates Online = weight updates after each sample Batch = weight update after all samples Mini-batch = weight updates after several samples
	Self-organizing maps	Two layers: input and output (feed forward) Each neuron contains a weight vector BMU = best matching unit, neuron closest to input Ordering phase = rough Convergence = fine-tuning Quantization error = distance between input and prototype Topographic error = if two BMU to an input are not adjacent
	Gaussian neighborhood function	Calculate degree of neighborhood Should decrees over time. Affect only neurons in the neighborhood
	Kohonens update rule	Weight update based on: Distance in the map (neuron to BMU) Distance in input space (data point to weight) Epoch

Share This Flashcard Set

Set the Language

Related Flashcards

Data Mining

Add to Folders

Upgrade to Cram Premium

Card Range To Study

21 Cards in this Set