Set the Language

We weren't able to detect the audio language on your flashcards. Please select the correct language below.

Front

Back

Flashcards
»
MCTS 70-448 Chap 9 Working with SSAS Data Mining

Mcts 70-448 Chap 9 Working With Ssas Data Mining

by bailey.aaronr.junk9, Aug 2009

Subjects: 70-448 9 chapter data mcts mining ssas with working

Favorite

Add to folder

Flag

Shuffle
Toggle On

Toggle Off
Alphabetize
Toggle On

Toggle Off
Front First
Toggle On

Toggle Off
Both Sides
Toggle On

Toggle Off
Read
Toggle On

Toggle Off

Reading...

Front

Card Range To Study

through

Play button

Progress

1/70

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

70 Cards in this Set

Front
Back

	What does the term data mining mean?	Data mining enables you to deduce hidden knowledge by examining or training the data. The knowledge you find is expressing patterns and rules.
	What is the unit of examination called?	Case, which can be interpreted as one apperance of an entity or a row in a table
	What does a Data model store?	Information about the variables you use, the algorithms you implement on the data, and the paramaters of the selected algorithms, and (after training is complete) extracted knowledge
	What are the two main classes of techniques in data mining?	Directed approach and undirected approach
	What are the four parts of the Data Mining Life Cycle?	1. Identifying the business prblem 2. Using data mining techniques to transform the data into actionable information 3. Acting on the information 4. Measuring the results
	What are the four stages in the transform phase of the Data Mining Life Cycle?	Prepre the data Create the models Examine and evaluate the models Deploy selected model
	What are the two most important factors in the success of a data mining project?	Data preparation and understanding
	What are four kinds of variables you can use to measure values?	Categorical or nominal attributes Ranks Intervals True numeric variables
	What is the difference between a simple or complex case?	Complex cases have nested tables in one or more columns. These need to be glattened or normalize them into standard rowsets where you perform joins between parent and child tables
	What two kinds of data do you need to decide how to handle?	Outliers and missing data
	What is the key to properly preparing the training set and the test set?	Statistically split the data randomly (you can use Row Sampling and Percentage Sampling Transformations in SSIS)
	How do you verify that your data split is random?	The first four moments (mean, standard deviation, skewness, and kurtosis)
	What is kurtosis?	Kurtosis measures peakedness of probability distribution, showing whether the distribution is narrow and high around the center or lower close to the center
	What do you use to create your models?	Analysis Services Project template in BIDS. You define the data source and DSV objects in the same way you create them for UDM dimensions and cube
	What is a data mining structure?	A data structure that defines the domain from which you build your mining models, it specifies the source data through a DSV, the columns, and training models. Can contain multiple mining models
	How many models should you make?	Multiple. Evaluate them all, see if they agree, and then deploy the one that works the best.
	What are the four options in the mining model for defining the use of columns?	Input Predictable Input and predictable Ignored
	What are the nine data mining algorithms included in SSAS?	Association rules Clustering Decision trees Linear regression Logistic Regression Naive Bayes Nueral Network Sequence Clustering Time Series
	What do you need to use in order to anaylze texts such as articles in magazines?	Text mining, which is not part of SSAS. Instead, use the two SSIS transformations for text mining: Term Extraction and Term Lookup
	Which algorithm do advanced e-mail SPAM filters use?	Naive Bayes
	What is the Association Rules algorithm used for?	Market based analysis. Used to find cross-selling opportunities
	What is the Clustering Algorithm used for?	Groups cases from a dataset into clusters of similar charctersitcs. Used for grouping customers for a CRM applicaiton. Also for searching for anomalies in data, as in fruad detection
	What is the Decision Trees Algorithm used for?	The most popular data mining algorithm. Easy to understand. Used to predict discrete and coninuous variables. A tree that predicts continous variables is a regression tree
	What is the Linear Regression algorithm?	Predicts continuous variales using a single multiple linear regreasion formula. It is a regression tree with no splits.
	What is the Logistic Regression algorithm?	A Logistic Regression algorithm is a Neural Network without any hidden layers.
	What is the Naive Bayes algorithm?	Calculates probabilities for each possible state of the input attribute. Fast and a good starting point. Doesn't support continuous attributes
	What is the Neural Network algorithm?	Serches for nonlinear functional dependencies. Harder to predict than linear lagorithms such as decision trees and not often used for business.
	What is the sequence clustering algorithm?	Searches for clusters based on a model rather than simliarity of cases. Builds markov chains with combinations of all possible states and assigns probabilities of moving from one state to another. Used for analyzing web sites.
	What is the time series algorith?	Created for forecasting continuous variables using ART Auto-Regression Trees and ARIMA Auto-Regressive Integrated Moving Average algorithms
	What three main tools are included in BIDS for creating mining models?	Data Mining Wizard Data Mining Desinger Data Mining Viewers
	What three things do you do with the Data Mining Wizard?	Define the DSV and the tables and columns from the DSV that you want to use Add an initial model to the structure Partition the data into training and test sets
	What five tasks can you perform in the Data Mining Designer?	Modify the mining structure add additonal mining models to the structure Process the strcuture and browse the models using Data Ming Viewers Check the accuracy of the models using a lift chart and other techniques Create DMX prediction queries using you models
	What are the two discretization methods in SSAS 2008?	EqualAreas and Clusters
	What are the different components that make up the SQL Server BI Suite?	SSAS cubes SSAS data mining SSRS SSIS
	What are three ways to prepare training sets and test sets?	The Data Mining Wizard and Data Mining Designer in BIDS to specify the percentage of the holdout data for the test set Use the TABLESAMPLE option of the T-SQL SELECT statement Use the SSIS Row Sampling Transfromation and Percentage Sampling Transformation
	What are the five supported content types for columns?	Discrete Continuous Discretized Ordered Cyclical
	What are the three key column types?	Primary Key Sequence Key Time
	What algorithm did Microsoft develop for clickstream analysis?	Sequence Clustering
	What is your case table and case-level columns when you mine an OLAP cube?	A dimension is the case table and any measure group or fact table connected with the selected dimension can be used as a case-level column
	Why can't you use a mining model as a dimension in the same cube in which you used it as the source for the model?	You would get a circular reference an never stop processing
	Which algorithm would you use to find the best way to arrange products on shelves in a retail store?	Associtation rules
	What does a lift chart show?	The compares the performance of models when predicting a value Or shows the quality of global predictions
	What does a classification matrix show?	Compares te actual valuses compared to the predicted values
	What are the five settings you can define for cross-validation?	Fold Count (how many partitions created in training data) Max Cases Target Attribute Target State Target Threshold (minimum accuracy needed for a prediction to be counted as correct)
	On a real data mining project, which two tasks will take most of the time?	Data preparation and then validation of predictive models
	What are the three measures that give you information about the quality of the rules that the Association Rules algorithm finds?	Support (How many times items were found together) Probability (build direction A-->B not B-->A) Importance (Score of the rule, how coorelated they are)
	What two paramaters can be used to control the creation o historical models?	HISTORICAL_MODEL_COUNT (number of model built) HISTORICAL_MODEL_GAP (Number of time slices between historical models)
	What are the two kins od DMX statements?	DDL Data Definition Language DML Data Manipulation Language
	What are eight DMX DDL statements?	Create mining strucutre Alter mining structure Create mining model Export Import Select into Drop Mining Model Drop Mining Structure
	What are four DMX DDL statements?	Insert into (which trains the model) Select Update Delete
	Do dataset tables support nested tables?	No
	What are the three types of charts you can use to evaluate predictive models?	Lift chart for global statistics Lift chart for a single value Profit chart
	How do you evaluate a Time Series model?	You can make historical predictions to evaluate a time series model
	How do you evaluate a clustering model?	You should evaluate clustering models from a business perspective
	Using DMX, can you add a mining model to an existing structure so that you can share the structure with other models?	You can use the ALTER MINING STRUCTURE dmx statement to add a mining model to an existing structure so it can be shared with other models
	Can you use DMX to drill through to the ample cases you used for trianing a mining model?	Yes, you can use the dmx SELECT FROM <model>.CASES syntax to drill through to the sample cases you used to train a mining model
	What are the four SSAS general data mining properties?	AllowSessionMiningModels AllowAdHocOpenRowsetQueries AllowedProvidersInOpenRowset MaxConcurrentPredictionQueries
	If you want to let applications use the SSAS data mining features, which data mining property do you need to set as "true"?	AllowSessionMiningModels
	What are your four options for impersonating information in a data source?	Use a specific username and password Use the service acount Use the credentials of the current user Inherit (Impersonates current users)
	What permission must a user have to connect to an SSAS database through SSMS or BIDS?	Read Definition permission for a complete SSAS database
	What is a data mining structure?	A blueprint of the database schema that is shared by all mining models inside the structure
	What is the defaul CacheMode property and what does it allow?	The default CacheMode property is set to KeepTrainingCases which caches the data mining model training data to allow the user to issue drill-through queries to see the source data. You can set it to ClearAfterProcessing to avoid keeping large data volumes in the cache
	What is another phrase for training the model?	Model processing
	What are the four steps to processing a mining structure?	Save changes in BIDS On the Mining Structure tab click the Process the Mining Structure button In the process dialog box select the desired processing option, then click run Watch the process prgress dialog box
	Can you use SQL server logins for SSAS authentication?	No. SSAS supports Windows authentication only
	Do end users need the Process permission on a mining structure?	No
	As an administrator, how would you prevent usage of the clustering data mining algorithm?	Use the Analysis Services Properties dialog box in SSMS
	What processing option deletes the training data in a mining structure without affecting its mining models?	Use the Process Clear Structure option to pruge the structure data without affecting the models inside the structure
	Can an SSRS report use a mining model as its source?	Yes
	How do you brows mining models?	The DMX language. You can also use the Prediction Query Builder in SSMS and BIDS to create prediction DMX queries

Share This Flashcard Set

Set the Language

Mcts 70-448 Chap 9 Working With Ssas Data Mining

Add to Folders

Upgrade to Cram Premium

Card Range To Study

70 Cards in this Set