Set the Language

We weren't able to detect the audio language on your flashcards. Please select the correct language below.

Front

Back

Related Flashcards

Flashcards
»
Data Mining (Ch. 4)

Data Mining (Ch. 4)

by super20132014, Apr. 2015

Favorite

Add to folder

Flag

Shuffle
Toggle On

Toggle Off
Alphabetize
Toggle On

Toggle Off
Front First
Toggle On

Toggle Off
Both Sides
Toggle On

Toggle Off
Read
Toggle On

Toggle Off

Reading...

Front

Card Range To Study

through

Play button

Progress

1/30

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

30 Cards in this Set

Front
Back

	Data Mining	The nontrivial process ofidentifying valid, novel, potentially useful, and ultimately understandablepatterns in data stored in structured databases
	Data	- Categorical Nominal Ordinal - Numerical Interval Ratio
	Types of Data Mining Patterns	1. Association 2. Prediction (supervised) 3. Cluster 4. Sequential
	Types of data mining	1. Discovery driven 2. Hypothesis driven
	Data Mining Process	1. CRISP-DM (Cross-Industry Standard Process) 2. SEMMA (Sample, Explore, Modify, Model and Assess) 3. KDD (Knwledge Discovery Database)
	CRISP DM	Step 1: BusinessUnderstanding Step 2: DataUnderstanding Step 3: DataPreparation (!) Step 4:ModelBuilding Step 5: Testingand Evaluation Step 6: Deployment
	SEMMA (Sample, Explore, Modify, Model and Assess)
	Classification	- Partof the machine-learning family - Employsupervised learningLearnfrom past data, classify new data - Theoutput variable is categorical (nominal or ordinal)
	Assessment Methods for Classification	- Predictiveaccuracy (Hitrate) - Speed(Modelbuilding; predicting) - Robustness - Scalability - Interpritability (Transparency;ease of understanding)
	Accuracy of Classification Models	Confusion matrix
	Accuracy of Classification Models
	True Positive Rate - Accuracy of Classification Models
	True Negative Rate - Accuracy of Classification Models
	Recall - Accuracy of Classification Models
	Precision - Accuracy of Classification Models
	EstimationMethodologies for Classification	- Simple Split: Splitthe data into 2 mutually exclusive sets, training ~70%and testing 30% - ForANN: the data is split into three sub-sets: training~60%, validation~20%, testing~20%
	Estimation Methodologies forClassification	k-FoldCross Validation (rotation estimation) Leave-one-out bootstrapping jackknifing
	Classification TechniquesEnd Fragment	Decisiontree analysis Statisticalanalysis Neuralnetworks Supportvector machinesCase-basedreasoning Bayesianclassifiers Geneticalgorithms Roughsets
	Decision Trees	Recursivelydivides a training set until each division consists of examples from one class DT algorithms varies from splitting criteria, pruning criteria and stopping criteria ID3, C4.5, C5; CART; CHAID; M5
	Cluster Analysis for Data Mining	Usedfor automatic identification of natural groupings of things Partof the machine-learning family Employunsupervised learning “Learnsthe clusters of things” from past data, then assigns new instances Thereis no output variable Alsoknown as segmentation
	Applications of Cluster Analysis	- Natural groupings ofcustomers - Identify rules for assigning newcases to classes for targeting/diagnostic purposes - Provide characterization,definition, labeling of populations - Decrease the size and complexityof problems for other data mining methods - Identify outliers in a specificdomain
	Clustering methods	- Statisticalmethods such as k-means,k-modes,and so on. - Neuralnetworks adaptiveresonance theory - ART, self-organizing map - SOM - Fuzzylogic e.g.,fuzzy c-means algorithm - Geneticalgorithms Divisiveversus Agglomeration methods
	Association Rule Mining	- Avery popular DM method in businessFindsinteresting relationships (affinities) between variables (items or events) - Partof machine learning family - Employsunsupervised learning - Thereis no output variable - Alsoknown as marketbasket analysis - Usedas an example to describe DM to non-data mining people, such as the“relationship between diapers & beer!”
	Association Rule Results for Business Use	- Put the items next to each other for easeof finding - Promote items as a package: do not put one on sale if the others are onsale - Place items far apart from each other sothat the customer has to walk the aisles to search for it, and by doing sopotentially see and buy other items - On an e-commerce site, promote “Customers
	Applications of Association Rule Mining	Inbusiness: cross-marketing, cross-selling,store design, catalog design, e-commerce site design, optimization of onlineadvertising, product pricing, and sales/promotion configuration Inmedicine: relationshipsbetween symptoms and illnesses;
	Algorithms for Assocaition	- Apriori - Eclat - FP-Growth+ Derivatives - hybrids of the three
	Artificial Neural Networks forDM	- Artificialneural networks (ANN or NN) is a brain metaphor for information processing - a.k.a.Neural Computing - Verygood at capturing highly complex non-linear functions! - Manyuses – prediction (regression, classification), clustering/segmentation - Manyapplication areas – finance, medicine, marketing, manufacturing, serviceoperations, information systems.
	Elements/Concepts of ANN/body	- Processingelement (PE) - Informationprocessing - Networkstructure - Learningparameters - ANNSoftware
	Data Mining myths	- provides instantsolutions/predictions. - is not yet viable for businessapplications. - requires a separate, dedicateddatabase. - can only be done by those withadvanced degrees. - is only for large firms that havelots of customer data. - is another name for good-oldstatistics.
	Common DM Blunders	1.Selectingthe wrongproblem forDM 2.Ignoringwhat your sponsor thinks DM is, what it reallycan do 3.Notleaving sufficient time for data acquisition, selectionand preparation 4.Lookingonly at aggregatedresults andnot at individual records/predictions 5.Beingsloppy aboutkeeping track of the data mining procedure and results 6.Ignoringsuspicious (goodor bad) findings and quickly moving on 7.Runningalgorithmsrepeatedly & blindly, w/o thinking about the nextstage 8.Naivelybelieving everything you are told about the data 9.Naivelybelieving everything you are told about your own DM analysis 10.Measuringresults differently from the way your sponsormeasures them

Share This Flashcard Set

Set the Language

Related Flashcards

Data Mining (Ch. 4)

Add to Folders

Upgrade to Cram Premium

Card Range To Study

30 Cards in this Set