• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/216

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

216 Cards in this Set

  • Front
  • Back

Give examples of which business functions BI can be used in

Marketing and sales


Supply Chain


Finance


IT

Give examples of exemplary usage scenario of BI in Marketing and Sales

- Customer Journey Analysis: analysis of customers social, mobil and locational data


- More accurately attribute sales to advertising campaigns -> prioritise marketing spending


- Analyse accuracy of salespeople's predictions


- Use smartphone and car location deviced to monitor how salespeople actually spend their time

Give examples of exemplary usage scenario of BI in Supply chain

- RFID, GPS, ILS sensors (Identification, location, condition) sensors can monitor the condition of goods in the supply chain: light, temperature, g-forces

Give examples of exemplary usage scenario of BI in finance

- Quantify risks of investment decisions


- Identify buying or selling opportunities


- Detect fraud and money laundering

Give examples of exemplary usage scenario of BI in IT

- Monitor reliability of IT operations


- Predict where and from whom security threats will emerge

Describe the evolution of BI&A (year and name)

- 1960: MIS (Management Information Systems)


- 1970: DSS (Decision Support Systems)


- 1980: EIS (Executive Information Systems)


- 1990: DWH (Data Warehousing)


- 2000: BI&A (Business Intelligence & Analytics)


- 2010-: Big Data (Analytics)

Describe the characteristics of MIS.

- Efficient data processing


- Integrated systems


- Vision of automatic decision support

Describe the characteristics of DSS.

- Statistical algorithms


- What-if analysis


- Database centricity


- Hardwired

Describe the characteristics of EIS.

- Multi-dimensional modelling


- Dedicated systems separated from operational systems


- Focus on top management

Describe the characteristics of DWH.

- Integration of multiple and heterogeneous sources


- OLAP analysis


- Data history

Describe the characteristics of BI&A

- Reporting to the masses


- Advanced BI front ends


- Real-time access


- Planning capabilities


- Closed loop performance management

Describe the characteristics of Big Data (Analytics)

- Very large amounts of data


- Structured and unstructured data


- Advanced analytics

Give both definitions of Decision Making and their author.

- Decision Making is the process of choosing among two or more alternative courses of action for the purpose of attaining a goal or goals (Turban et al. 2008)


- Decision making is the process of sufficiently reducing uncertainty and doubt about alternatives to allow a reasonable choice to be made from among them (Harris, 2012)

Describe the decision making process.

1. Intelligence: search for conditions that call for decision.


2. Design: invent, develop and analyse possible alternative courses of action (solutions)


3. Choice: select a course of action from among those available


4. Implementation: adapt the selected course of action to the decision situation (i.e. problem solving or opportunity exploring)

What is the definition of Decision support system?

A decision support system (DSS) isa computer-based information system that supports decision-making activities.

Why use computerized decision support systems?

- Speedy computation


- Improved communication and collaboration


- Increased productivity of group members


- Improved data management


- Quality support


- Agility support


- Overcoming cognitive limits in processing and storing information

What is the definition of Business Intelligence and Analytics?

BI&A refers to the techniques, technologies, systems, practices, methodologies and applications that analyze critical business data to help an enterprise better understand its business and market and make timely business decisions.

What are the key characteristics of BI&A 1.0?

DBMS-based, structured content


- RDBMS & data warehousing


- ETL & OLAP


- Dashboards & scoreboards


- Data mining & statistical analysis

What are the key characteristics of BI&A 2.0?

Web-based, unstructured content


- Information retrieval and extraction


- Opinion mining


- Question answering


- Web analytics and web intelligence


- Social media analytics


- Social network analysis


- Spatial-temporal analysis

What are the key characteristics of BI&A 3.0?

Mobile and sensor-based content


- Location-aware analysis


- Person-centered analysis


- Context-relevant analysis


- Mobile visualisation & HCI

What are the Gartner BI Platforms Core Capabilities of BI&A 1.0?

- Ad hoc query & search-based BI


- Reporting, dashboards & scoreboards


- OLAP


- Interactive visualization


- Predictive modeling & data mining

What are the characteristics of Gartner Hype Cycle in BI&A 1.0?

- Column-based DBMS


- In-memory DBMS


- Real-time decision


- Data mining workbenches

What are the characteristics of Gartner Hype Cycle in BI&A 2.0?

- Information semantic services


- Natural language question answering


- Content & text analytics

What are the characteristics of Gartner Hype Cycle in BI&A 3.0?

- Mobile BI

Draw a high-level framework for BI&A

See lecture 1, slide 41.

Draw a matrix of the different users of BI&A

See lecture 1, slide 42.

For which tasks are BI&A platsforms actually being used? (List the 8 most common in order)

1. Use parameterized reports / dashboards


2. View static reporting


3. Interactively exploring and analyzing data


4. Doing simple ad hoc analysis


5. Monitoring performance via a formal scoreboard


6. Using personalized dashboards


7. Executing moderately complex to complex ad hoc analysis and discovery


8. Using predictive analysis and / or data mining models

Comments about typical BI&A usage

- Strong focus on "traditional" business analysis via reports


- More users perform simple than complex analysis


- Only very few users contuct predictive analysis and/or use data mining techniques

What features are provided by state-of-the-art BI&A Platforms in terms of information delivery?

-Reporting


-Dashboards


-Ad hoc query


-Microsoft Office integration


-Search-based BI


-Mobile BI

What features are provided by state-of-the-art BI&A Platforms in terms of analysis?

- Online analytical processing (OLAP)


- Interactive visualization


- Predictive modeling and data mining

What features are provided by state-of-the-art BI&A Platforms in terms of integration?

- Scorecards


- Prescriptive modeling, simulation and optimization


- BI infrastructure


- Metadata management


- Development tools


- Collaboration

Describe reporting

Provides the ability to create formatted and interactive reports, with or without parameters, with or without parameters, with highly scalable distribution and scheduling capabilities

Describe dashboards

Includes the ability to publish web-based or mobile reports with intuitive interactive displays that indicate the state of a performance metric compared with a goal or target value. Increasingly, dashboards are used to disseminate real-time data from operational applications, or in conjunction with a complex-event processing engine.

Describe ad hoc query

Enables users to ask their own questions to the data, without relying on IT to create a report. In particular, the tools must have a robust semantic layer to enable users to navigate available data sources

Describe Microsoft Office integration

Sometimes, Microsoft Office (particularly excel) acts as the reporting or analytics client. In these cases, it is vital that the tool provides integration with Microsoft Office, including supprt for document and presentation formats, formulas, data "refreshes" and pivot tables. Advanced integration includes cell locking and write-back.

Describe search-based BI

Applies a search index to structured and unstructured data sources and maps them into a classification structure of dimensions and measures that users can easily navigate and explore using a search interface

Describe mobile BI

Enables organizations to deliver analytic content to mobile devices in a publishing and/or interactive mode and takes advantage of the mobile client's location awareness

Describe online analytical processing (OLAP)

Enables users to analyze data with fast query and calculation performance, enabling a style of analysis such as "slicing and dicing". Users are able to navigate multidimensional drill paths. They also have the ability to write back values to a proprietary database for planning and "what if" modeling purposes. This capability could span a variety of data architectures (such as relational or multidimensional) and storage architectures (such as disk-based or in-memory)

Describe interactive vizualization

Gives users the ability to display numerous aspects of the data more efficiently by using interactive pictures and charts, instead of rows and columns

Describe predictive modeling and data mining

Enables organizations to classify categorical variables and to estimate continuous variables using mathematical algorithms.

Describe scorecards

These take the metrics displayed in a dashboard a step further by applying them to a strategy map that aligns key performance indicators (KPI's) with a strategic objective

Describe prescriptive modeling, simulation and optimization

Supports decision making by enabling organizations to select the correct value of a variable based on a set of constraints for deterministic processes and by modeling outcomes for stochastic processes

Describe BI infrastructure

All tools in the platform use the same security, metadata, administration, portal integration, object model and query engine and should share the same look and feel

Describe meta data management

Tools should leverage the same metadata and the tools should provide a robust way to search, capture, store, reuse and publish metadata objects, such as dimensions, hierarchies, measures, performance metrics and report layout objects

Describe development tools

The platform should provide a set of programmatic and visual tools, coupled with a software developers kit for creating analytic applications, integrating them into a business process and/or embedding them in another apllications

Describe collaboration

Enables users to share and discuss information and analytic content and/or manage hierarchies and metrics via discussion threads, chat and annotations.

Name some complete BI-platforms

IBM


SAP


SAS


ORACLE


Microsoft

Name some platforms with focus on selected BI capabilities

tableau


QlikView


InfoZoom


Teradata

Describe the gartner magic quadrant for BI&A platform vendors

- Number of BI&A platsform vendors continuously increases


- Most vendors are catagorized as "niche players" or "leaders"


- There are only few "challangers" and "visionaries"

Name the key take aways from lecture one

See last slide.

The first definition of data warehouse and its author

A data warehouse is a pool of data produced to support decision making ...Dataare usually structured to be available in a form ready for analytical processing. (Turban, 2008)

The second definition of data warehouse and its author.

A data warehouse is a subject-oriented, integrated, time-variant, non-volatilecollection of data in support of management’s decision-making process. (Inmon, 1996)

The third definition of data warehouse and its author.

A copy of transaction data specifically structured to query and analysis. (Kimball, 1996)

What are the characteristics of Data Warehousing?

- Subject oriented


- Integrated


- Time variant (time series, chronology)


- Nonvolatile (persistent)


- Relational/multidimensional


- Client/server


- Include metadate

What are the differences between OLTP and Data Warehouse regarding data content, data organization and the nature of data? (OLTP vs Data Warehouse)

Data content: Current value vs historical data, summarized data, calculated data


Data organization: Application by application vs subject areas across enterprise


Nature of data: Dynamic vs Static until refreshed, based on frequency

What are the differences between OLTP and Data Warehouse regarding data manipulation and usage? (OLTP vs Data Warehouse)

Data manipulation: Updated on a field-by-field basis vs accessed & manipulated usually no direct update


Usage: Highly structured, repepitive processing (Clerical user) vs Highly structured, analytical processing (Knowledge user)

What are the differences between OLTP and Data Warehouse regarding response time and updates vs reports? (OLTP vs Data Warehouse)

Response time: cricical (sub-second to several seconds) vs several seconds to minutes


Updates&Reports: Real-time Updates, batch reporting vs Batch updates, real-time reporting

What are the direct benefits of a data warehouse

- Allows end users to perform extensive analysis


- Allows a consolidated view of corporate data


- Better and more timely information access


- Enhanced system performance


- Simplification of data access

What are the indirect benefits from end users using the direct benefits of Data warehouse?

- Enhance business knowledge


- Create competitive advantage


- Enhance customer service and satisfaction


- Facilitate decision making


- Help in optimizing business processes

What are the three simplified parts of data warehousing architecture?

- The data warehouse that contains the data and associated software


- Data acquisition (back-end) software that extracts data from internal (ERP-) systems and external sources, consilidates and summaraize them and loads them into the data warehouse


- Client (front-end) software that allows users to access and analyze data from the warehouse

What are data sources?

Contains the data to be loaded in the Data Warehouse e.g. ERP systems, Relational databases, flat files, web services

What is enterprise data warehouse (EDW)?

A centralized repository for the entire enterprise

What is a data mart?

A departmental data warehouse that stores only relevant data.

What is a dependent data mart?

A subset that is created directly from a data warehouse. Consistent data (integrated)

What is an independent data mart?

A small data warehouse designed for a strategic business unit or a department

What is the main characteristic, pros and cons of data mart centric architecture?

Independent data marts


+ Easy to build organizationally


+ Easy to build technically


- Business enterprise view unavailable


- Redundant data costs


- High ETL costs


- High App costs


- High DBA and operational costs

What is the main characteristic, pros and cons of the virtual, distributed and federated architecture?

Leave data where it lies


+ No need for ETL


+ No need for separate platform


- Only viable for low volume


- Meta data issues


- Network bandwidth and join complexity issues


- Workload typically placed on a workstations


(this never works)

What is the main characteristic, pros and cons of the hub-and-spoke architecture?

Dependent data marts


+ Allows easier customizations of user interfaces and reports


- Business enterprise view challenging


- Redundant data costs


- High DBA and operational costs


- Data latency


(most common)

What is the main characteristic, pros and cons of enterprise data architecture?

Centralized integration data with direct access


+ Business enterprise view


+ Design consistency & data quality


+ Data reusability


- Requires corporate leadership and vision

Name the first 5 factors that potentially affect the architecture selection decision

1. Information interdependence between organizational units


2. Upper management's information needs


3. Urgency of need for a data warehouse


4. Nature of end-user tasks


5. Constraints on resources

Name the last 5 factors that potentially affect the architecture selection decision

6. Strategic view of the data warehouse prior to implementation


7. Compatibility with existing systems


8. Perceived ability of the in-house IT-staff


9. Technical issues


10. Social/political factors

Describe the Kimball model

Kimball views data warehousing as a constituency of data marts. Data marts are focused on delivering business objectives for departments in the organization. And the data warehouse is a conformed dimension of the data marts. Hence a unifies view of the enterprise can be obtain from the dimension modeling on a local department level (Turban et al., 2007) (see also lect. 2, sl. 21)

Describe the Inmon model

Inmon beliefs in creating a data warehouse on a subject-by-subject area basis. Hence the developmnt of the data warehouse can start with data from the online store. Other subject areas can be added to to the data warehouse as their needs arise. Point-of-sale (POS) data can be added later if management decides it is necessary. (Turban et al., 2007) (see also lect. 2, sl. 22)

Describe the difference between data mart approach and EDW approach in terms of overall approach, complexity and development methodology.

- Overall approach: Bottom-up vs Top-down


- Complexity: High vs Low


- Development methodology: iterative vs step-wise

Describe the the data mart approach in terms of architecture structure

- Data mart is subject-oriented (e.g. for single business processes) or department-oriented (e.g. only for sales)


- "Build one data mart at a time" -> the DW is developedsequentially


- DW = collection of data marts

Describe the the EDW approach in terms of architecture structure

- One central EDW provides theconsistent and comprehensive view ofthe enterprise


- Data marts are optional supplementsfor specific departments or subjects


- Data marts are based on the EDW.That means, they get their data fromthe EDW.

Describe the difference between data mart approach and EDW approach in terms of scope, development time, cost, difficulty, size, freq of update and no. of users (there are more on lec.1 sl. 26)

- Scope: one subject area vs. several subject areas


- Time: months vs. years


- Cost: $10,000 to $100,000+ vs $1,000,000+


- Diff.: Low to medium vs. high


- Size: MB to several GB vs. GB to PB


- Freq. of upd.: hourly, daily, weekly vs daily, weekly


No. of users: 10s vs 100s to 1000s

When modeling a data warehouse, what four perspectives is included?

1. The principle design approach for building the data warehouse, typically one destinguishes (Kimball vs Inmon)


2. The multidimensional data model as a foundation of data warehouse design


3. Relational data models containing data, e.g. operational and master data


4. Meta data describing the structure of all data warehouse data.

What is measurable business facts?

Facts are quantifies with explanatory power for diagonosis, monitoring and coordination of a system


E.g. revenue, profit costs. (not derived facts, KPIs)


Facts include descriptive attribute, such as currency, unit, range of values

What are dimensions in the multidimensional model?

A business fact can be viewed and analyzed along different perspectives (e.g. time, space etc). Further hierarchical structures can also be added to dimensions (the time dimension can be structured in Year, Quarter, Month ...)

What different types of facts are there?

- Additive : additive aggregation along all dimensions possible


- Semi-additive: additive aggegation only for selected dimension


- Non-additive (e.g. avarage values, percentages)

What are the differences between star schema and snowflake in terms of table structure?

Both star schema and snowflake schema has one fact table and multiple dimension tables. However, snowflake model has several attribute tables.

What are the differences between star schema and snowflake in terms dimension normalization, modeling effort and data compression?

- Dimension normalization: No dimension normalization vs Dimension normalization (3NF)


- Modelling effort: Low vs high


- Data compression: Low vs high

What are the differences between star schema and snowflake in terms query performance with text filters

Slow vs fast. e.g. "Sum of all sales from "Stuttgart" store"

Explain the three steps of the data provision process.

ETL:


- extraction (reading data from a database)


- transformation (i.e. converting the extracted data from its previous form into the form in which it needs to be)


- and load (putting the data into the data warehouse)

Describe in what ways extraction can be differentiated.

- Synchronous / asynchronous access


- File based extraction / stream based extraction


- Full extraction / delta extraction


- Usage of filters / no filters


- Standard extractors /custom extractors

in what four activities can transformation be subdivided into?

- Filtering (e.g. filter all deleted orders)


- Harmonization (e.g. resolve master data incosistencies)


- Enrichment (e.g. calculate new facts from existing ones)


- Aggregation (e.g. by minimizing a dimension)

Describe the load phase

Data is updated in the final data storage (e.g. a cube)


- Full load vs Delta load


- Daily, Weekly, Monthly load


Load process has to be customized to the chosen data model (star vs. snowflake etc.)


Data quality mechanisms are often implemented (e.g. uniqueness, mandatory fields) to be triggered during the load

Describe ETL automation

- Usually not triggered manually, but run automated, trigger types are time or event trigger


- Can be programmed or modeled to be automated


- May be visualized to show sequence of steps


- Logging and monitoring funcionality supports the Data Warehouse administrators in case of errors during ETL process


- ETL automation is an important part of every BI project and can require a large portion of the project efforts

What is the main difference between metadata and master data?

Metadata = information about data


Master data = non-transactional business data


(Further info: slide 55 lecture 2)

What types of metadata are there and what are their goals?

Business: Explain what things mean


Technical: Technical description of data assets


Operational: Monitor job execution


(Further info: slide 57, lec 2)

Name potential benefits of metadata mgmt?

- Build common understanding of data


- Facilitate the quest for data quality


- Support discovery and reuse of data


- Analyse dependencies


- Facilitate future changes


- Monitor usage


(Details slide 57 and 28 lecture 2)

What four dimensions should be defferentiated when speaking about big data?

- Volume - terabytes


- Variety - structured, unstructured, text & multimedia


- Velocity - streaming data


- Veracity - imprecise data types (uncertainty)


(the four v's)

Name three reasons why the big data trend probably will continue to accelerate in the coming years

1. IOT accelerates data generation with sensors


2. Media spectrum is broadened by image, audio and video


3. Faster computers produce more complex simulation results that need to be analyzed

What are the opportunities of Big Data?

- Simulations, sentiment analysis, network analysis

What are the threats of the Big Data trend?

- Increased dependancy on systems


- Privacy, data security, and ethics

What technological changes have been made to enable big data?

Data processing:


- from 32-bit to 64-bit processing


- from single core sequential to multi core parallel processing


Data storage: from disk storage to in-memory


Data organization:


- from row-based to column-based databases


- from simple vectors to dictionary encoding

Describe column-based databases

Group by attributes instead of ID. Allow much faster data access for typical BI operations

Explain dictionary encoding

- In order to increase data compression and improve search performance, BI systems utilize indexing


- Column-based BI systems use single-attribute vectors.


- Instead of dimensions, simple dictionaries are applied. (see slide 23, lect. 3)

What's the difference between traditional and big data approach?

Traditional: Business users determined what questions to ask -> IT structured data accordingly


Big data approach: IT delivers a platform to enable creative discovery -> business explores what questions could be asked


(however, both overlap each other)

What are the differences between traditional computing and stream computing?

Historical fact finding vs. current fact finding


Find and analyze information stored on disk vs. analyze data in motion - before it is stored


Batch paradigm, pull model vs. low latency paradigm, push model


Query-driven: submits queries to static data vs. data driven - bring the data to the query


Query->Data->Result vs. Data->Query->Result

What is the motivation for stream computing?

- Data might be outdated before users are able to analyze it


- Data rates and volumes are too great for storing and subsequent analysis

What is the problem with analyzing data, the solution and the goal?

Time required for analyzing data is very long. Solution is parallel analysis of data.

Goal: Do not a analyze a query on a single server! Instead, distribute the query across an entire network and process it in parallel!


What are the challanges of parallel computing?

- Difficult to decide how to split data and computation across network

- Risk of server failure


How does Hadoop approach to solve the challanges of parallel computing?

- Node failure: Store data redundantly on multiple nodes within the network -> if one node failes, 2 other nodes can be used instead

- Low bandwidth: Store data at nodes and remember where it is stored -> data does not need to be distributed to nodes anymore -> only the computation needs to be distributed


- Development of program code for parallel processing is difficult: provide a simple map-reduce algorithm to the developer to handle work distribution and management of nodes automatically


Describe how HDFS works

Split the data into partitions (blocks), each block is being replicated three times across 3 nodes. A master node stores metadate (e.g. file names, locations,...)

Describe MapReduce algorithms

Map function:


- Takes a set of key-value pairs


- Creates a set of zero or more key-value pairs


- Input and output pairs are usually different


Reduce function:


- Executed for each key


- aggregates all values according to the key

Describe denormalized data models and when to use it

Store data redundantly. Also known as "embedded data models". Every entry stored in one data piece.


Use:


-If you have "contains"-relationship (i.e., 1:1 relationships)


- If you have 1:n relationship

What are the benefits and downsides of denormalized data models?

+ Request and retrieve related data in a single database operation


+ Better performance for read operations


+ Update related data in a single write operation


- Database records may grow after creation


- Database record growth can impact write performance


- Threat of data fragmentation

Describe normalized data models and when to use it.

Normalized data models descibe relationships using references between data pieces. Use:


- If denormalized data models provide only little read performance advantages


- If modeling m:n relationships


- If modeling large hierarchical data sets

What is SQL?

- SQL is a standardized, powerful, high level programming language for querying databases


- Almost all relational databases support SQL


- Relations (=tables) are linked via references -> SQL primarily supports normalized data models!

What are the advantages and downsides of SQL?

+ Standardized and implemented by basically every relational database system


+ Powerful -> many operations exists


+ Many people (also from the business side) are able to write SQL code


+ High level code


+ ACID (atomacity, consistency, isolation, durab.)


- Data structures (e.g. tables) need to be defined upfront


- SQL code always needs to be translated into low level code -> reduces performance


- Rather slow execution

What are relational data models mainly used for?

Transactions and analysis of transactional data

SQL vs NoSQL

SQL:


- The notion of SQL is typically used to refer to relational data models and/or normalized data models


NoSQL:


- The notion of NoSQL is typically used to refer to non-relational data models and/or denormalized data models

Give examples of non-relational data model approaches in NoSQL

Key-Value stores:


+ Very simple to program and implement


+ Can be easily distributed across multiple machines


Document-oriented store:


+ Still simple to program and implement


+ Can be easily distributed across multiple machines


+ More structure than key-value stores


Graph-oriented store:


+ Flexible extension of data model

T/F: Big data platforms are to process unstructured data

F: Structured data is part i almost all engagements

T/F: Big data technology requires huge amounts of data

F: It is more about flexibility than pure volume

T/F: Big data = Apache Hadoop

F: Hadoop is a well known platform but dependent on the use cases other platforms are suited better

T/F: Big Data makes traditional BI / DWH platforms obsolete

F: Databases and BI will co-exist with Big Data technologies.

T/F: Big Data requires new skills

T: New job roles are needed, for instance, data scientists.

T/F: Big Data changes IT architecture

T: Sandboxes, deep data zones, queryable archives...

T/F: Big Data requires focus on data security and privacy

T

T/F: Big Data fits best with agile methods

T

T/F: Big Data affects only some industries

F: There is probably no industry without a use case for Big Data

T/F: Big data is a hype

T & F

What three target groups are there for BI consumption and what reports do they demand?

- Executive management - Performance Management, Dashboards, Scoreboards, KPIs


- Business analysts: ad-hoc queries, On-Line Analytical Processing (OLAP)


- Front Line employees - Operational or standard reports

Describe the dissemination of BI consumption

94%: Reporting => information delivery


5%: Data analysis: Data analysts & power users (controlling, purchasing, etc.)


1%: Data mining: small amount of expert users in dedicated analysis departments

Describe the BI consumption taxonomy

Visualization: Table vs Graph


Singularity: Single report vs Multiple reports


Interaction: no vs. low vs. high

Describe the taxonomy of business reporting

Table report: Table, Single, No/Low interaction


Graph report: Graph, Single, No/Low interaction


Dashboard: Graph, Multiple, No/Low interaction

Describe the taxonomy of Analytical reporting

OLAP report: Table vis., Single report, High inter.

What is the definition of Business Reporting?

All types of BI consumption covering:


- Efficient (visual) communication of data


- with limited interactivity


- and limited analytic capabilities

What kind of table reports exists regarding creation, execution and delivery?

Creation:


- Standard reports (created by developer, consumed by user)


- Ad-Hoc Reports (created and consumed by user)


Execution and delivery:


- Manual execution


- Automatic execution (and delivery)

What characteristics does table reports have?

- Simplest form of BI-reports


- Two basic forms:


one-dimensional (list) and two dimensional (matrix)


- Often a small data scope (e.g. department)


- Used by all kind of users


- Reports are usually parameterized



What report creation trigger types are there?

- Periodic report: created according to a certain schedule (montly analysis of sales by customer)


- Special/exceptional report: created when something out of the ordinary happens (defined threshold has been exceeded)



In what four basic ways can exceptions be incorporated into (trigger) reports

- Prepare the report only when exception occur


- Highlight the exceptions


- Group the exceptions together (e.g. specific "deviation" column)


- Comparison of actual and planned figures. Basic statistical formulas are used (variance, median)

What are the three steps to visualize data?

1. Choose visual representation


2. Arrangement of visual elements


3. Selection and (de-)emphasis of interesting data

Describe what visual representation means

- Representation: mapping of available information to a visual forman


- Data objects, their attributes, and the relationships among data objects are translated into graphical elements such as points, lines, shapes and colors

What three dimensions can visual representation be organized in? Give examples

Data type (one-/two-/multi-dimensional, text)


Graph type (Bar, Line, Pie, Radar)


Interaction & distortion technique (Standard, Projection, filtering, zoom)

Explain what different types of data there are

- One-dimensional data (time varient data, e.g. stock prices)


- Two/three-dimensional data (e.g. geo-graphical data)


- Multidimensional (typical for datamining tasks, no "obvious" mapping of multiple dim.)


- Special data types (textual, graph/network data)

Explain what different types of graph there are

- Bar graph (and variations)


- Line graph (and variations)


- Area graph


- Pie/ring graph


- Tree map


- Radar graph


- Gauges and meters


- Bullet graph


- Box/scatter plots

Describe the motivation for interaction techniques and important concepts

- Interactions allows for more dynamic analysis of data


- Helps to encourage exploration


Important concepts:


Direct manipulation strategies


Rapid, incremental and reversible actions


Selection by pointing (not typing)


Immediate and continuous feedback

Describe what interaction techniques there are

- Dynamic projection (dynamically change the visualization to explore multidimensional data sets)


- Interactive filtering


--browsing, can be difficult for big data sets.


--querying, need to specify a subset


- Zooming


- Distortion (some part of data in high detail)


- Brushing and linking (selection from one visualization is fed into another, selected instances highlighted in some way)

When is a bar graph good to use?

- For displaying fact(s) associated with nominal (e.g. region) or ordinal attributes (e.g. size)


- For comparing values with each other

What bar graph variations are there?

- Stacked bar graph


good to display multiple instance of a whole and its parts - focus on whole


- Grouped bar graph


good to display multiple instance of a whole and its parts - focus on parts

When is a line graph appropriate to use?

- Good to display fact(s) associated with interval attributes (e.g. time)


- Good to reveal shape of data (e.g. movements up and down), especially changes over time

What is spark lines?

A variation of line graph:


A very space-efficient representation which is often used to display changes of multiple data sets in a dashboard (invented by Edward Tufte)

When is a bar/line graph combination appropriate?

Good if you want to combine visualiations of data changes and data comparison in one graph e.g. expences (bar chart) and profits (line chart)

What are the problems with area graphs?

- Occlusion (some of the values are hidden by an overlaying area)


- Inaccurate interpretation

When is a pie graph good and what are the disadvantages?

+ Good to display a whole and its parts, but often this can be done better with a bar graph


- Difficult to assess size / value of parts


- Difficult to differentiate colors (if there are many)


- Constant eye movements between graph and legend

What are some characteristics of tree maps?

- The Treemap is a space-constrained visualization ofhierarchical structures.


- Can show attributes of leafnodes using size and color(brightness and hue).


- Easy to navigate into sub-trees(cf. interaction techniques)

What is a radar graph and when is it appropriate to use?

A radar graph is a circular graph that encode values on seperate axes that radiated from center. Usually a bar graph is better to use due to its better readability. (exception: HR-competences and when the axes can naturally be arranged as a circle, e.g. the hours of a day)

What are gauges and meters and what are their issues?

- They display the value of a single fact, sometimes compared to related values (e.g. targets) or color-encoded ranges)


- Common in dashboards: take up a lot of space


- Color coding (red vs green) cannot be perceived by all people


(better to use bullet graph)

What are bullet graphs

- A variation of a bar graph developed by Stephen Few


- Can serve as a replacement for dashboard gauges and meters

What are box plots?

- A way of displaying the distribution of data


- Condensed visualization of core statistical parameters

Name some characteristics of scatter plots

- Additional attributes can be displayed using size, shape and color of the markers


- Correlation structures can be recognized easily

Comments about visualizations

Helpful to:


- explore data


- confirm hypothoses


- communicate findings


(less is more)

What is the definition of dashboard

A dashboard is a visual display of the most important information needed to achieve one or more objectives, consolidated and arranged on a single screen so the information can be monitored at a glance

What does the business need from a dashboard

- High impact visualization of key metrics


- Easy to use and find information


- Intuitive user interface and navigation


- Ability to manage and monitor metrics effortlessly


- Actionable analyses


- Drillable metrics

Give some examples of good and bad dashboard qualities

+ Important values are color-coded


+ Very cautious use of color


+ Compact visualization of extensve amounts of information


+ Homogeneouse, consistent usage of graph types


- Too many colors


- Bad scaling (hard to compare)


- Bad

What are the six gestalt principles

- Proximity


- Similarity


- Closure


- Enclosure


- Connection


- Continuity

Name some common mistakes in dashboard design

1. Exceeding the boundaries of a single screen


2. Supplying inadequate context for the data


3. Displaying excessive detail or precision


4. Choosing a deficient measure


5. Choosing inappropriate display media


6. Introducing meaningless variety


7. Using poorly designed display media


8. Encoding quantitative data inaccurately


9. Arranging the data poorly


10. Highlighting important data ineffectively or not at all


11. Chattering the display with useless decoration


12. Misusing or overusing color


13. Designing an unattractive visual display

What is OLAP?

Technologies and tools that support (ad-hoc) analysis for multi-dimensionally aggregated data (Table, Single report, High Interaction)

What different types of OLAP are there?

MOLAP: (a lot of computing power, copy data)


- data resides in a multidimensional DBMS


- multidimensional engine (OLAP server) provides access


ROLAP:


- data resides in a relational DBMS


- OLAP server provides SQL queries


HOLAP:


- detailed data redsides in a relational DBMS


- aggregated data resides in a multidimensional DBMS

What are some typical OLAP operations?

Roll up (drill-up): summarize data


- by climbing up hierarchy or by dimension reduction


Drill down (roll up): reverse of roll up


- from higher level summary to lower level summary or detailed data


Slice and dice:


- filter using one or more dimension


Pivot (rotate):


- reorient the cube, visualization, 3D to series of 2D planes

Describe roll-up and drill-down

Drill-down (roll-down)):


- From higher level summary to lower level summary or detailed data, or introducing new dim.


Roll-up (drill-down):


- reversed


- summarize data by climbing up hierearchy or by dimension reducing

Describe how slice and dice works.

The slice-operation represents "cutting out" one slice of an n dimensional cube by using a filter on one dimension (results in an (n-1) dimensional cube)


The dice operation represents "cutting out" a small cube from a big one by performing a filter. (results in a smaller cube, a dice)



Describe pivot/rotate operations

Swapping the rows and columns, or moving one of the row dimensions into the column dimension

What is the motivation of big data analytics?

The amount of data to be analyzed is constantly growing. A sole concentration on manual / interactive analysis methods like table or OLAP is not sufficient anymore


Methods and tools that semi-automatically generate knowledge from large data sets and documents are needed

What 5 challenges are there of advanced analytics?

- Forecasting (how do historical sales translate..)


- Key influencers (what is the main influencers for success/failure)


- Trends (what are the trends: historical/emerging)


- Relationships (what are the correlations indata)


- Anomalies (what anomalies..)

What is the definition of Knowledge Discovery (in Databases)?

Knowledge Discovery in Databases (KDD) is a non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. (Fayyad et al. 1996)

Definition: Hypothesis vs discovery

-Hypothesis-driven approach


Begins with a proposition by the user, who then seeks to validate the truthfulness of the proposition


-Discovery-driven approach


Finds patterns, association, and relationships among the data in order to uncover facts that were previously unknown or not even contemplated by the organization

Definition: supervised learning vs unsupervised learning

Supervised: Goal: Predict data with unknown target attribute value with minimal error.


- Search for dependancies of a target attribute on the input data


Unsupervised: Goal: create a pattern of a more compact description of the data.


- No reference to target attribute, error not measurable

Knowledge Discovery vs. Statistical approaches

- It was expected that knowledge discovery would substitute classical statistical approaches


- There was the hope that knowledge discovery can be successfully applied without experience and knowledge about the methods


- Fact is: they complement each others and software tools have merged together

What is the definition of data mining?

"Data mining is a process that uses statistical, mathematical,artificial intelligence, and machine learning techniques to extractand identify useful information and subsequent knowledge fromdatabases.


Data Mining is used for finding mathematical patterns from usuallylarge sets of data. These patterns can be rules, affinities,correlations, trends, or prediction models." (Nemati and Barko, 2001)

Give some examples of supervised and unsupervised data mining techniques

Unsupervised: Association rules, K-means clustering, Hierarchical clustering (clustering and association rules)


Supervised: SVM, naive bayes, decision tree, neural networks (classification)

What are association rules and what two parameters are relevant?

- Association rules describe correlations between attributes appearingtogether in transactions.


- Confidence, strength of correlation


- Support, frequency of appearance

Describe decision tree shortly

- A decision tree is a set of logical rules.


- Decision tree is an intensional description of a given set of classes


- Important: decision trees are easier to read and understand that logical rules

What is the goal of predictive analytics?

Unlock data to move from decision making from sense & respond to predict and act

Describe what types of analytics there are, their goals and an example.

Descriptive: summarize what happened


- Vast majority of analytics is descriptive (OLAP)


Predictive: make predictions about the future


- Utilize a variety of statistical, modeling, data mining, and machine learning techniques to study recent and historical data (sent. analys.)


Prescriptive: Recommend one or more causes of action and show most-likely outcomes of the action

What is predictive analytics, two definitions?

"... the exploration and analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns and rules"


"... the process of discovering meaningful new correlations, patterns and trends by sifting through large amounts of data stored in repositories, using pattern recognition techniques as well as statistical and mathematical techniques."

What is the definition of text mining?

"Applications of data mining to non-structured or less structured text files. It entails the generations of meaningful numerical indices from the unstructured text and the processing these indices using various data mining algorithms large databases" (Turban et al., 2007)

Why is textual data very different from database-like data?

- is unstructured (text) or semi-structured (e.g. markup with HTML, XML)


- may be interlinked (e.g. hyperlinks)


- is very heterogenous (different languages, spelling errors)

Shortly: text classification and text clustering

- "Text categorization is the activity of labeling natural language texts with thematiccategories from a predefined set." (vector of text document with standard classifications algorithms)


- "...the partitioning of texts into previously unseencategories.." (vector -> standard clustering algorithms)

What is the definition of web mining?

"The discovery and analysis of interesting and useful information fromthe Web, about the Web, and usually through Web-based tools."

What is the definition of business performance management?

“Business Performance Management enables an organization toeffectively monitor, control and manage the implementation ofstrategic initiatives”

What are the pros and cons of spreadsheets regarding business planning? (64% of planning tools used)

+ Small businesses


+ Extremely individual requirements


+ Short-term need


- Process control


- Access protection


- Performance


- Complexity


- Errors


- Consolidation


- Seasonal trend model


- Organizational changes


- Growing company

Define a balanced scoreboard

“A balanced scorecard is a comprehensive set ofperformance measures defined from four differentmeasurement perspectives (financial, customer,internal, and learning and growth) that provides aframework for translating the business strategy intooperational terms” (Kaplan and Norton, 1996)

What four dimensions are there of a balanced scoreboard?

- Financial (should serve as a focal point for all objectives and measures in all other measures)


- Customer (enable companies to align their customer outcome measures - satisfact....)


- Internal (will focus their metrics on processes that will deliver the objectives for cust/shareh)


- Learning&Growth (develops objectives and measures to drive learning for other 3 perspect)

What is a strategy? Draw a strategy map.

"A strategy is a set of hypothesis about cause and effect"


map on slide 81 , lecture 5-6

What five steps does the integrated, closed-loop strategy to execution cycle include?

1. Develop, formulate and syndicate strategy


2. Translate and cascade strategy


3. Operationalize strategy


4. Monitor and optimize strategy execution


5. Validate and adapt strategy

Draw the "Big Picture"

Slide 87, lecture 5-6

What is the motivation for process intelligence?

- BI in its classical form rather looks at outcome-oriented, high-level KPI's, decoupled from the actual business processes.


- Process intelligence looks at the process level and focuses on operational performance to be transparent at all times

What is the definition of (Business) Process Intelligence?

"(Business) Process Intelligence (BPI) refers to the application of business intelligence techniques to business processes" (Grigori et al. 2004). Extension to Grigori et. al’s (2004):


“BPI comprises a large range of application areas spanning fromprocess monitoring and analysis to process discovery, conformancechecking, prediction and optimization.”

Draw the Soh and Markus (1995) model

Slide 5 and 7, lecture 7

What is the definition of organizational adoption process?

Organizational Adoption (process) involves all actions of individualsin an organization that deal with creating awareness, selecting,evaluating, initiating and deciding for the implementation of newES technology.

What is the definition of the conversion process?

Conversion (process) involves all actions of individuals (in anorganization or across organizations) that deal with developing andimplementing a new ES technology.

What is the definition of the use process?

Use (process) involves all actions of individuals in an organizationthat deal with using and changing ES technology or the respectivework system to realize intended business value.

Draw the key performance indicators pyramid

Slide 11, lecture 7

What are some pre-adaption / organisational adoption activities?

- KPI alignment

- BI strategic alignment


- Governence


- BI vendor selection


- Organizational structure


- Controlling


Describe the BI project liftecycle

Justification


Planning


Business analysis


Design


Construction


Deployment

How does a BI strategy benefit IT?

- Help align with business partners, formalize business needs


- Create prioritized roadmap for the enterprise of short, medium and long term projects aligned with strategic business goals delivering measurable results


- Creating business justifications for an enterprise scope and end-to-end BI including data management

How does a BI strategy benefit a LOB?

- Have departmental spend go further and contribute to enterprise investments required.


- A departmental BI need often involves needing data from other groups. SOlve the departmental pain points by removing limits of a departmental focus through an enterprise-wide strategy


- An enterprise BI approach provides a unified approach by all departm. => "speak same langu"

What can BI LOB-organization consist of?

- Central BI competence centers


- Decentralize BI groups / departments


- BI governance committees

Give some examples of why BI initiatives are complex endeavors

- Disparate business data must be integrated and integration goes beyond simply bridging systems


- It's about information consolidation and integrity as well as establishing an end-to-end view


- Alignment across organizations regarding master data and KPIs needs to happen


- New technology is introduced

What three factors can implementation be categorized into?

1. Organizational issues


2. Project issues


3. Technical issues

What issues/common beliefs.. are to be considered when building a BI system?

- ..data warehousing database design is the same as transactional database design


- Delivering data with overlapping and confusing definition


- ..promises of performance, capacity and scalability (triangle of the three)


- .. that your problems are over when DWH is up and running


- .. that Big Data makes DWH obselete

What are the most common failure factors in BI projects?

- Unclear business / information objectives


- Low levels of data summarizations : getting lost in detail


- Lack of (Top) Management Support


- Lack of clear BI Strategy


- Cultural issues being ignored


- Inappropriate architecture

Name some best practices for implementations

- Project must fit with corperate strategy and business objectives


- There must be a complete buy-in to the project by executives, managers and users


- It is important to manage user expectations about the completed project


- The DWH should be built incrementally


- Built in adaptability

How are the best practices implemented?

- The project must be managed by both IT and business professionals


- Do not overlook training requirements


- Be politically aware


- Only load data that has been cleansed and is of a quality understood by the organization


- Do not stop with technical system, but set up organizational support

What is the main issues pertaining to scalability and what does good scalability mean?

The main issues:


- The amount of data in the system


- How quickly the system is expected to grow


- The number of concurrent users


- The complexity of user queries


Good scalability means that queries and other data-access functions will grown linearly with the size of the system

What four main areas should effective security in a BI system focus on?

- Establishing effective corporate and security policies and procedures


- Implementing logical security procedures and techniques to restrict access


- Limiting physical access to the data center environment


- Establishing an effective internal control review process with an emphasis on security and privacy

Describe the six steps of engineering projects.

1. Justifications (asses the business need)


2. Planning (develop strategic and tactical plans)


3. Business analysis (perform detail analysis)


4. Design (conceive a product that solves the business problem or enables the busi opport.)


5. Construction (build the product)


6. Deployment (implement/sell the finished pr. and measure its effectiveness) => 1.

How should the development steps of a BI project look like

See slide 34 and 37 of lecture 7

What are the characteristics of routinization?

- Repetitous work


- Perceived as a normal part of employees work activities


- Standardized work


- Incorporated into employees work processes


- Employees develop familarity with the implemented IS

What are the characteristics of infusion?

- Realization of hidden value of an IS


- Extension of the IS (e.g., developing additional features)


- Infusion and routinization do not necissarily occur in sequence but rather occur in parallel.

What are the challanges for organizations regarding BI system post-adoption?

Having the coexistence of routine (standard reports and so forth) and innovative (further and new insights) use of a BI system.