Set the Language

We weren't able to detect the audio language on your flashcards. Please select the correct language below.

Front

Back

Related Flashcards

Flashcards
»
SOM 485 FINAL Exam

Som 485 Final Exam

Dec. 2013

Favorite

Add to folder

Flag

Related Essays

Nt1310 Unit 3 Study Guide
11. OLAP: an online analytical processing (OLAP) engine, responsible for data discovery, including capabilities for limitless report viewing, complex ana...
Computer Engineering: Data Mining
Due to the way of a data warehouse, most apropos information that has been chosen by data scientists/analysts and business clients ought to be situated insid...
Iranian Natural Resources
Oh & Pradhan. 2011; Pradhan 2012; Tien Bui et al. 2012), and data mining techniques such as random forest, boosted regression tree, classification and regres...
Primary Stages Of Intelligence Analysis
The process entails extracting knowledge from large datasets to identify hidden relationships (Keyvanpour et al., 2011). In most cases, the process involves ...
Data Mining And Business Analysis Essay
Running head: Data Mining & Business Analytics Mid Term MIS 5375 580: Data Mining & Business Analytics Mid Term Exam Mukesh Reddy Dhanagari Texas A&M Inte...
A Modest Proposal: Ranking Optimization Technique
Data mining consists of the several technical approaches, including machine learning, statistics, database organization, and so on The goal of the data minin...
9/11 National Security
The role of data mining is to be able to analyze this data to identify trends or to monitor unusual activity. By utilizing data mining, criminal activity can...
Data Mining Case Study Summary
1. Summarize the concerns expressed by this data analyst. Data mining in the real world is a lot different from the way it’s described in textbooks for man...
Data Technology: Disadvantages And Benefitations Of Database Technology
TASK 2 – DISCUSSION ON DATABASE TECHNOLOGIES a) Critically discuss the benefits of the following database technologies • Data Mining • Data Warehousing ...
Data Mining In Healthcare
Data mining is most dominant technology at present. It is one of the most important and inspiring area of research. It is a process for exploration and analy...

Shuffle
Toggle On

Toggle Off
Alphabetize
Toggle On

Toggle Off
Front First
Toggle On

Toggle Off
Both Sides
Toggle On

Toggle Off
Read
Toggle On

Toggle Off

Reading...

Front

Card Range To Study

through

Play button

Progress

1/100

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

100 Cards in this Set

Front
Back

	OLAP definition	An approach to quickly answer multi-dimensional analytical queries.
	OLAP data	Organized heirarchically and stored in cubes instead of tables
	OLAP function	Slicing and dicing of data
	OLAP definition	A category of applications and technologoes for collecting, managing, processing and presenting multidimensioanl data for analysis and management purposes.
	What is BI?	Computer-based methods for identifying and extracting useful information from business data.It encompasses OLAP, relational reporting and data mining.
	Excel can represent multi-dimension data. T or F?	True. What can Excel represent?
	What function on Excel provide OLAP capabilities?	Pivot Tables
	Measure	A summarized numerical value that you use to monitor how well the business is doing. e.g., Units sold, revenue, defects, number of people who responded to an ad.
	Aggregation techniques used when presenting measures	Sum, Average, Max Min
	How do you expand a measure (e.g., spread total sales across a time interval, monthly, or region, product, salesperson)	Add a dimension (e.g., spread total sales across a time interval, monthly, or region, product, salesperson)
	Dimensions	The different characteristics by which the measure values may be presented to the user
	Cube	A measure and its associated dimensions. A subset of highly interrelated data that is organized to allow users to combine any attributes in a cube (e.g., stores, products, customers, suppliers) with any metrics/measures in the cube (sales, profit, units, age) to create various views.
	Data cube	A two-dimensional, three-dimensional, or higher dimensional object in which each element of the data represents a measure of interest.
	slice	A two dimensional view that is a subset of highly interrelated data in a multi-dimensional cube,
	Fact Table	Stores the detailed values for measures in un-normalized form
	Dimension Table	A table that houses the name and attributes of the different characteristics by which the measure values may be presented to the user, it is linked by a foreign keys to a fact table. They contain classification and aggregate information about the central fact table rows.
	A ______________ is a column in a dimension table.	attribute
	Star Schema	The simplest DW design, all dimension tables are directly related to the fact table by foreign keys, it is denormalized and takes more space
	grain	The highest level of detail that is supported in a data warehouse.
	Function of a dimension table	Defines how data will be sliced and diced
	Snowflake schema	A DW design where dimension tables are layered and not all directly related to the fact table. It is normalized, more complicated queries and more processing time for complicated joins.
	Data Mining	The process through which previously undiscovered patterns in data are identified leading to knowledge discovery
	Techniques data mining uses to extract and identify new knowledge remaining untapped in large databases	statistical, mathematical, artificial intelligence and machine learning techniques
	Types of new knowledge that data mining creates	rules, affinities, correlations, trends or prediction models
	Data mining extracts data from ___________________ data sources.	disparate data sources
	Sensitivity analysis	the study of how the uncertainty in the output of a mathematical model or system (numerical or otherwise) can be apportioned to different sources of uncertainty in its inputs.
	Data	refers to a collection of facts, usually obtained as the result of experience, observations, or experiments.
	Patterns DM tries to identify	Classification Clustering Association/Sequencing discovery Prediction/Forecasting
	Classification	HISTORICAL BEHAVIOR TO PREDICT - Analyzing the historical behavior of groups of entities to predict the future behavior of a new entity from its similarity to those groups. It is the most frequently used data mining method for real world problems
	Another name for Classification	Supervised induction
	Common classification tools	neural networks and decision trees, logistic regression, and discriminant analysis, and emerging tools such as rough sets, support vector machine, and genetic algorithms
	How Classification works	learns patterns from past data in order to place new instances (with unknown labels) into their respective groups
	Examples of classification data mining method	weather prediction, credit approval, store location, targeted marketing, and fraud detection.
	Supervised learning	A method of training artificial neural networks in which sample cases are shown to the networks as input, and the weights are adjusted to minimize the error in the outputs.
	The approach/algorithm used in the classification data mining method	decision trees
	Which data mining methods employ supervised learning methods.	Classification and Prediction/Forecasting
	Which data mining methods employ unsupervised learning methods.	Clustering and Association/Sequence discovery
	Classification method objective	Analyzing the historical behavior of groups of entities to predict the future behavior of a new entity from its similarity to those groups.
	Classification	This induced model consists of generalizations over the records of a training dataset, which help distinguish pre-defined classes.
	Clustering	PARTITIONING - A natural partitioning of data into groups of entities with similar characteristics
	Uses of Clustering methods	grouping students according to grades, perform market segmentation,
	Association/Sequence discovery	SIMULTANEOUS RELATIONSHIPS TIME ORDER - establishing relationships among items that occur together or in a time order (basket analysis).
	Association/Sequence discovery	a popular and well-researched technique for discovering interesting relationships among variables in large databases.
	The approach/algorithm used in the Association/Sequence discovery data mining method	Apriori
	Define Decision Tree Attributes	The input variables that may have an impact on the classification of different patterns.
	Predictive accuracy	The model’s ability to correctly predict the class label of new or previously unseen data. It is the percentage of test dataset samples correctly classified by the model.
	Speed	The computational costs involved in generating and using the model, where faster is deemed to be better.
	Robustness	The model’s ability to make reasonably accurate predictions, given noisy data or data with missing and erroneous values.
	Scalability	The ability to construct a prediction model efficiently given a rather large amount of data.
	Interoperability	The level of understanding and insight provided by the model (e.g., how and/or what the model concludes on certain predictions)
	Gini Index	Used to determine the purity of a specific class as a result of a decision to branch along a particular attribute or variable
	Gini index	Measures the homogeneity/diversity of data in a sample set.
	Gini index=0	Gini idex rating if the data is homogeneous
	Gini index >0	Gini index rating which indicates diversity in the data
	The Apriori algorithm	The most commonly used algorithm to discover association rules.
	What is a data warehouse?	A subject oriented, integrated, time-varient, nonvolatile collection of data, produced to support decision making; it is also a repository of current and historical data, usually structured in a form ready for analytical processing.
	What are the major components of the data warehousing process?	Transaction data systems ETL Process - Extract, Transform, Load Data Warehouse -comprehensive database Data Marts - Middleware/Analytical Tools- SQL, cubes BI Applications (Visualization)- OLAP, Dashboard, Web
	What Data marts provide	different views of the data warehouse
	Neural computing	a pattern-recognition methodology for machine learning
	Artificial Neural Network (ANN)	The resulting model from pattern-recognition methodology for machine learning
	What have neural networks been used for?	Used in many business applications for pattern recognition, forecasting, prediction and classification. It is the key component of any data mining tool.
	Connection weights	The key element of an ANN, they express the relative strength of the input data (always a single attribute), crucial in that they store learned patterns of information.
	Summation Function	computes the weighted sums of all the input elements.
	Transfer function (e.g., Sigmoid function)	A popular and useful non-linear function, it is an S-shaped transfer function in the range of 0 to It is used to sum the inputs to the node and to define the response out from the node.
	Supervised learning	Sets of input are iteratively presented to the neural network and compared to the desired output
	Learning algorithm	The determines how the neural interconnection weights are corrected due to differences in the actual and desired output for a member of the training set.
	Unsupervised learning	The network learns a pattern through repeated exposures, it is not compared to a target answer, self-organizing or clustering
	Learning Rate (alpha)	A parameter in neural networks; it affects the speed at which the ANN arrives at the solution; it determines the portion of the existing discrepancy that must be offset.
	Momentum	A parameter in back-propagation neural networks, it slows, smoothens and stabilizes the learning process; reduces over-correcting of weights.
	Back propagation learning	The best-known learning algorithm in neural computing where the learning is done by comparing computed outputs to desired outputs of training cases.
	Text mining	The semi-automated process of extracting patterns (useful information and knowledge) from large amounts of unstructured data sources.
	corpus	A large and structured set of texts (now usually stored and processed electronically) prepared for the purpose of conducting knowledge discovery.
	Term	A term is a single word or a multiword phrase extracted directly from the corpus of a specific domain by means fo natural language processing methods.
	Concepts	underlying meaning, the features generated from a collection of documents a categorization methodology. Compared to terms, they are a higher level of abstraction.
	Stemming	The process of reducing inflected words to their base or root form.
	stop words (noise words)	Words that are filtered out prior to or after processing of natural language text
	polysemes	homonyms, syntactically identical words (same spelling) with different meanings
	Tokenizing	The process of breaking a stream of text up into words, phrases, symbols, or other meaningful elements which becomes input for further processing A categorized block of text in a sentence. The block of text is categorized according to the function it performs.
	Index word	Any word appearing in 2 or more documents and is not a stop word
	Term-By-Document Matrix (Occurrence matrix)	A representation of the frequency-based relationship between the terms and documents in tabular format. Terms are listed in rows, documents in columns, and the frequency listed in cells
	Term Frequency-Inverse Document Frequency (TF-IDF)	A statistical measure to evaluate how important a word is to a document in a collection
	Natural Language Processing (NLP)	An important component of text mining, it is a sub-field of AI and computational linguistics.
	Singular Value Decomposition (SVD)	A matrix operation in linear algebra, that splits a given matrix of data into three parts. It is a dimensionality reduction method, used to transform a Term-by-Document matrix to a manageable size; similar to Principal Component Analysis
	Principal component analysis	A mathematical procedure that transforms a number of possibly correlated variables into a smaller number of uncorrelated variables
	The first principal component	In PCA, what accounts for as much of the variability in the data as possible
	In PCA, what accounts for as much of the remaining variability as possible	each succeeding component in PCA accounts for what?
	The text mining process	1. Establish the Corpus - collect and organize 2. Create the Term-Document Matrix - introduce structure to the Corpus, reduce dimensionality, SVD 3. Extract knowledge - discover patterns from the TD Matrix, classification, clustering, association, trend analysis
	Commercial Software Text mining tools	SPSS PASW Text Miner SAS Enterprise Miner Statistica Data Miner ClearForest
	Free text mining software tools	RapidMiner GATE Spy-EM
	Web mining	the process of discovering intrinsic relationships from Web data (textual, linkage, or usage)
	The main areas of Web mining	Content Mining Structure mining Usage Mining
	Web Content Mining	Uses unstructured textual content of the Web pages as a data source
	Web Structure Mining	Uses URL links contained in the web pages as a data source
	Web Usage Mining	Uses the detailed description of a Web site's visits (click streams) as a data source
	Web Content and Structure mining tool	Data collection via Web crawlers
	Authoritative pages	Links included on a web age can help to infer "authority", like citations used in journal article. There are differences, web links may be paid ads, they may exclude commercial rivals, may not be decriptive
	Hubs	One or more web pages that provide a collection of links to authoritative pages, they provide links to a collection of prominent sites on a specific topic of interest.
	Hyperlink-Induced Topic Search algorithm (HITS)	The most popular known and referenced algorithm to calculate hubs and authorities. It is a link analysis algorithm that rates web pages using the hyperlink information contained within them.
	Web usage mining tools and methods	data stored in server access logs, referrer logs, agent logs, and client-side cookies user characteristics and usage profiles metadata, such as page attributes, content attributes, and usage data
	ETL steps are performed by ________________ in SQL server	The Integration Services tool performs____________________?

Share This Flashcard Set