• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/70

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

70 Cards in this Set

  • Front
  • Back
What does the term data mining mean?
Data mining enables you to deduce hidden knowledge by examining or training the data. The knowledge you find is expressing patterns and rules.
What is the unit of examination called?
Case, which can be interpreted as one apperance of an entity or a row in a table
What does a Data model store?
Information about the variables you use, the algorithms you implement on the data, and the paramaters of the selected algorithms, and (after training is complete) extracted knowledge
What are the two main classes of techniques in data mining?
Directed approach and undirected approach
What are the four parts of the Data Mining Life Cycle?
1. Identifying the business prblem
2. Using data mining techniques to transform the data into actionable information
3. Acting on the information
4. Measuring the results
What are the four stages in the transform phase of the Data Mining Life Cycle?
Prepre the data
Create the models
Examine and evaluate the models
Deploy selected model
What are the two most important factors in the success of a data mining project?
Data preparation and understanding
What are four kinds of variables you can use to measure values?
Categorical or nominal attributes
Ranks
Intervals
True numeric variables
What is the difference between a simple or complex case?
Complex cases have nested tables in one or more columns. These need to be glattened or normalize them into standard rowsets where you perform joins between parent and child tables
What two kinds of data do you need to decide how to handle?
Outliers and missing data
What is the key to properly preparing the training set and the test set?
Statistically split the data randomly (you can use Row Sampling and Percentage Sampling Transformations in SSIS)
How do you verify that your data split is random?
The first four moments (mean, standard deviation, skewness, and kurtosis)
What is kurtosis?
Kurtosis measures peakedness of probability distribution, showing whether the distribution is narrow and high around the center or lower close to the center
What do you use to create your models?
Analysis Services Project template in BIDS. You define the data source and DSV objects in the same way you create them for UDM dimensions and cube
What is a data mining structure?
A data structure that defines the domain from which you build your mining models, it specifies the source data through a DSV, the columns, and training models. Can contain multiple mining models
How many models should you make?
Multiple. Evaluate them all, see if they agree, and then deploy the one that works the best.
What are the four options in the mining model for defining the use of columns?
Input
Predictable
Input and predictable
Ignored
What are the nine data mining algorithms included in SSAS?
Association rules
Clustering
Decision trees
Linear regression
Logistic Regression
Naive Bayes
Nueral Network
Sequence Clustering
Time Series
What do you need to use in order to anaylze texts such as articles in magazines?
Text mining, which is not part of SSAS. Instead, use the two SSIS transformations for text mining: Term Extraction and Term Lookup
Which algorithm do advanced e-mail SPAM filters use?
Naive Bayes
What is the Association Rules algorithm used for?
Market based analysis. Used to find cross-selling opportunities
What is the Clustering Algorithm used for?
Groups cases from a dataset into clusters of similar charctersitcs. Used for grouping customers for a CRM applicaiton. Also for searching for anomalies in data, as in fruad detection
What is the Decision Trees Algorithm used for?
The most popular data mining algorithm. Easy to understand. Used to predict discrete and coninuous variables. A tree that predicts continous variables is a regression tree
What is the Linear Regression algorithm?
Predicts continuous variales using a single multiple linear regreasion formula. It is a regression tree with no splits.
What is the Logistic Regression algorithm?
A Logistic Regression algorithm is a Neural Network without any hidden layers.
What is the Naive Bayes algorithm?
Calculates probabilities for each possible state of the input attribute. Fast and a good starting point. Doesn't support continuous attributes
What is the Neural Network algorithm?
Serches for nonlinear functional dependencies. Harder to predict than linear lagorithms such as decision trees and not often used for business.
What is the sequence clustering algorithm?
Searches for clusters based on a model rather than simliarity of cases. Builds markov chains with combinations of all possible states and assigns probabilities of moving from one state to another. Used for analyzing web sites.
What is the time series algorith?
Created for forecasting continuous variables using ART Auto-Regression Trees and ARIMA Auto-Regressive Integrated Moving Average algorithms
What three main tools are included in BIDS for creating mining models?
Data Mining Wizard
Data Mining Desinger
Data Mining Viewers
What three things do you do with the Data Mining Wizard?
Define the DSV and the tables and columns from the DSV that you want to use
Add an initial model to the structure
Partition the data into training and test sets
What five tasks can you perform in the Data Mining Designer?
Modify the mining structure
add additonal mining models to the structure
Process the strcuture and browse the models using Data Ming Viewers
Check the accuracy of the models using a lift chart and other techniques
Create DMX prediction queries using you models
What are the two discretization methods in SSAS 2008?
EqualAreas and Clusters
What are the different components that make up the SQL Server BI Suite?
SSAS cubes
SSAS data mining
SSRS
SSIS
What are three ways to prepare training sets and test sets?
The Data Mining Wizard and Data Mining Designer in BIDS to specify the percentage of the holdout data for the test set
Use the TABLESAMPLE option of the T-SQL SELECT statement
Use the SSIS Row Sampling Transfromation and Percentage Sampling Transformation
What are the five supported content types for columns?
Discrete
Continuous
Discretized
Ordered
Cyclical
What are the three key column types?
Primary
Key Sequence
Key Time
What algorithm did Microsoft develop for clickstream analysis?
Sequence Clustering
What is your case table and case-level columns when you mine an OLAP cube?
A dimension is the case table and any measure group or fact table connected with the selected dimension can be used as a case-level column
Why can't you use a mining model as a dimension in the same cube in which you used it as the source for the model?
You would get a circular reference an never stop processing
Which algorithm would you use to find the best way to arrange products on shelves in a retail store?
Associtation rules
What does a lift chart show?
The compares the performance of models when predicting a value
Or shows the quality of global predictions
What does a classification matrix show?
Compares te actual valuses compared to the predicted values
What are the five settings you can define for cross-validation?
Fold Count (how many partitions created in training data)
Max Cases
Target Attribute
Target State
Target Threshold (minimum accuracy needed for a prediction to be counted as correct)
On a real data mining project, which two tasks will take most of the time?
Data preparation and then validation of predictive models
What are the three measures that give you information about the quality of the rules that the Association Rules algorithm finds?
Support (How many times items were found together)
Probability (build direction A-->B not B-->A)
Importance (Score of the rule, how coorelated they are)
What two paramaters can be used to control the creation o historical models?
HISTORICAL_MODEL_COUNT (number of model built)
HISTORICAL_MODEL_GAP (Number of time slices between historical models)
What are the two kins od DMX statements?
DDL Data Definition Language
DML Data Manipulation Language
What are eight DMX DDL statements?
Create mining strucutre
Alter mining structure
Create mining model
Export
Import
Select into
Drop Mining Model
Drop Mining Structure
What are four DMX DDL statements?
Insert into (which trains the model)
Select
Update
Delete
Do dataset tables support nested tables?
No
What are the three types of charts you can use to evaluate predictive models?
Lift chart for global statistics
Lift chart for a single value
Profit chart
How do you evaluate a Time Series model?
You can make historical predictions to evaluate a time series model
How do you evaluate a clustering model?
You should evaluate clustering models from a business perspective
Using DMX, can you add a mining model to an existing structure so that you can share the structure with other models?
You can use the ALTER MINING STRUCTURE dmx statement to add a mining model to an existing structure so it can be shared with other models
Can you use DMX to drill through to the ample cases you used for trianing a mining model?
Yes, you can use the dmx SELECT FROM <model>.CASES syntax to drill through to the sample cases you used to train a mining model
What are the four SSAS general data mining properties?
AllowSessionMiningModels
AllowAdHocOpenRowsetQueries
AllowedProvidersInOpenRowset
MaxConcurrentPredictionQueries
If you want to let applications use the SSAS data mining features, which data mining property do you need to set as "true"?
AllowSessionMiningModels
What are your four options for impersonating information in a data source?
Use a specific username and password
Use the service acount
Use the credentials of the current user
Inherit (Impersonates current users)
What permission must a user have to connect to an SSAS database through SSMS or BIDS?
Read Definition permission for a complete SSAS database
What is a data mining structure?
A blueprint of the database schema that is shared by all mining models inside the structure
What is the defaul CacheMode property and what does it allow?
The default CacheMode property is set to KeepTrainingCases which caches the data mining model training data to allow the user to issue drill-through queries to see the source data. You can set it to ClearAfterProcessing to avoid keeping large data volumes in the cache
What is another phrase for training the model?
Model processing
What are the four steps to processing a mining structure?
Save changes in BIDS
On the Mining Structure tab click the Process the Mining Structure button
In the process dialog box select the desired processing option, then click run
Watch the process prgress dialog box
Can you use SQL server logins for SSAS authentication?
No. SSAS supports Windows authentication only
Do end users need the Process permission on a mining structure?
No
As an administrator, how would you prevent usage of the clustering data mining algorithm?
Use the Analysis Services Properties dialog box in SSMS
What processing option deletes the training data in a mining structure without affecting its mining models?
Use the Process Clear Structure option to pruge the structure data without affecting the models inside the structure
Can an SSRS report use a mining model as its source?
Yes
How do you brows mining models?
The DMX language. You can also use the Prediction Query Builder in SSMS and BIDS to create prediction DMX queries