• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/14

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

14 Cards in this Set

  • Front
  • Back

Hierarchical Clustering

Data is not partitioned (i.e. clustered) in one step

Hierarchical Clustering

Two methods: Agglomerative and Divisive

Agglomerative

Starts with each record/item in its own group, and then combines groups and more popular

Divisive

Starts with all records in one large group and then splits the groups

Dendogram

-Diagram illustrating the fusions or divisions at successive stages


-Large vertical jumps in the dendogram indicate the fusion (in agglomerative) or division (in divisive) imply the clusters are ‘further apart’

Dendogram

How to use:


-If you want k clusters, draw a perfectly horizontal line that intersects k of the vertical lines on the dendogram




-Everything that is grouped below each of the intersected vertical lines is in the same cluster

Measuring Distance

-Numerous ways to measure the distance between two clusters and/or data points




-Most common is to use Euclidean distance formula: Same way you measure distance between two points on a graph

How do you measure distance between two clusters (which may have many data points)?

Single Linkage


Complete Linkage


Average Linkage


And others (Average group linkage, ward’s hierarchical)

Single Linkage

measure the Euclidean distance between the two closest data points from each cluster

Complete Linkage

measure the Euclidean distance between the two furthest data points from each cluster

Average Linkage

calculate the average Euclidean distance between all possible pairs of data points between the two clusters

Association Rule Mining

-Seek to find interesting association and/or correlation relationships within large data sets


-Typically found in market basket analysis:


-->Attempt to determine which groups of items are commonly purchased together by customers


-Attempts to find items sets which are disjoint


-->i.e. the items in one item sets do not belong to any other item set

Lagging Measures

Tell what has happened (are often external business results such as customer satisfaction)

Leading Measures

Can be used to predict what will happen (are often internal metrics such as employee satisfaction, billing accuracy)