Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
43 Cards in this Set
- Front
- Back
Data Warehouse |
- A logical collection of information– gathered from many different operationaldatabases– that supports business analysis activities anddecision-making tasks - The primary purpose of a data warehouse is– to aggregate information throughout anorganization– into a single repository for decision-making purposes |
|
Data Warehouse Fundamentals |
- Extraction, Transformation, and loading (ETL)- Aprocess that Extracts information from internal and externaldatabases– Transforms the information using a common setof enterprise definitions– Load the information into a data warehouse -Data Mart– A subset of data warehouse particular to theneeds of a given business unit (finance,marketing, accounting, etc.) |
|
Information and Cleansing |
Information cleansing or scrubbing– a process that weeds out and fixes or discardsinconsistent, incorrect, or incomplete information • Information cleansing activities |
|
Characteristics of High-Quality Information |
Accuracy , Completeness , Consistency , Uniqueness , timeliness |
|
Multidimensional Analysis |
Database contain information in a series of twodimensionaltables• In a data warehouse and data mart, informationis multidimensional, it contains layers of columnsand rows– Dimension: a particular attribute of information |
|
Cube |
common term for the representation ofmultidimensional information |
|
Business Intelligence |
applications andtechnologies used to gather, provide access to, andanalyze data and information to support decisionmakingefforts |
|
Business Intelligence 3 categories |
Operational/ Tactical/ Strategic |
|
Data mining |
the process of analyzing data toextract information not offered by the raw dataalone |
|
Data-mining tools |
– use a variety of techniquesto find patterns and relationships in largevolumes of information– Classification– Estimation– Affinity grouping– Clustering |
|
Data Mining Capabilities |
Cluster analysis , Association detection , Statistical analysis |
|
Cluster analysis |
A technique used to divide an information set intomutually exclusive groups |
|
Association detection |
reveals the degree to which variables are related andthe nature and frequency of these relationships in theinformation |
|
Statistical analysis |
Perform such functions as information correlations,distributions, calculations, and variance analysis |
|
Clustering Analysis 2 |
Figuring out groups in a way that– The members in one group are as close to each other aspossible– and those in different groups are as far as possible. • Segment customer information and identify behavioraltraits |
|
Association Detection (Market-Basket Analysis) |
Determines groups ofproducts that customerstend to purchase together |
|
Association Detection (Market-Basket Analysis ) |
Cross selling , Up selling |
|
Cross selling: |
the action ofpersuading customers to purchasean additional product or servicethat is in relation to the productthey are already purchasing |
|
Up-selling: |
the action ofpersuading customers to purchasemore expensive items, upgradesor other add-ons in attempt tomake amore profitable sale. |
|
Statistical Analysis 2 |
– performs such functions asinformation correlations, distributions,calculations, and variance analysis |
|
Forecast: |
predictions made on the basis oftime-series information |
|
Time-series information: |
time-stampedinformation collected at a particular frequency |
|
3 V’s of Big Data (IBM) |
Volume: Data quantity Velocity: Data speed Variety: Data Types |
|
How to mine precious information? |
Examining large amount of data – Aggregation and Statistics • Data warehouse and OLAP – Indexing, Searching, and Querying • Keyword based search• Pattern matching– Knowledge discovery• Data Mining• Statistical Modeling – Identification of hidden pattern, unknown correlations – Better business decisions/ Effective marketing,customer satisfaction, increased revenue |
|
Acquiring |
- Obtain data – Cleanse data – Organize and relate data – Catalog data |
|
How Do BI Information Systems Support BI Activities? |
- Analyzing: Reporting & Data mining - |
|
RFM analysis |
- Recency: How recently a customer purchased an item – Frequency: How frequently she purchases an item – Monetary value: How much she spends each time |
|
Interactive |
Users allowed to change both the analysis and structure ofthe report OLAP – online analytical processing – OLAP slice and dice – Drill down |
|
How Do BI Information Systems Support BI Activities? |
Publishing– Visualization |
|
BI Success Factors |
• Right Focus– Have to have a clear business problem • Right People– Employees have to have an ability to understanddata and information • Right Technology– Not all software tools are right for you. • Right Culture– Data-driven decision making should beacceptable through the organization. |
|
Collaboration Filtering |
Predict what movies/books/.. a person may be interested in,on the basis of– Past preferences of the person– Other people with similar past preferences– The preferences of such people for a new movie/book/ |
|
Artificial Intelligence (AI) |
Simulates human thinking and behavior, such asthe ability to reason and learn. Its ultimate goalis to build a system that can mimic humanintelligence. (ex. AlphaGo) |
|
Categories of AI |
Expert systems– Neural networks– Genetic algorithms– Intelligent agents |
|
Types of Data |
Structured , unstructured , |
|
Structured Data |
Information stored in databases (Main Frame/SQL Server,Oracle, Access, Excel)– all records have same format as defined in the relational schema• Rows and columns• Data is consistent, uniform– Data mining friendly |
|
Unstructured Data |
Refers to information that doesn’t reside in a traditional rowcolumndatabase (not structured into “cells”)– All those things that can’t be so readily classified.– Text and multimedia content, e-mail messages, graphic images,videos, streaming, web pages, PDF files, Social media data(text, blogs, tweets, comments, tags), user-generated contentreviews |
|
Text Mining |
application of data mining to textual documents– The process of deriving high-quality information from text– Leveraging text improve decisions and predictions |
|
Seven Types of Text Mining |
Search and Information Retrieval (IR) , Document Clustering , Document classification, Web mining , Information Extraction (IE):, Natural Language Processing (NLP) |
|
Web Mining |
Discover useful information or knowledge fromthe Web hyperlink structure, page content, andusage data |
|
Three types of web mining tasks |
– Web structure mining – Web content mining – Web usage mining |
|
Two main types of textual information |
– Fact and Opinions |
|
social network |
is a social structure of people, related(directly or indirectly) to each other through a commonrelation or interest |
|
Social network analysis (SNA) |
is the study of socialnetworks to understand their structure and behavior |