• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/65

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

65 Cards in this Set

  • Front
  • Back
Data
Factual information used as a basis for reasoning, discussing, or calculation
Qualitative/Categorical Data
-Nonnumeric
-Dummy or Indicator variables
-The resulting data are merely codes representing the various categories

EX:
-Male or Female
-On Sale or Not On Sale
-Small, Medium or Large
Quantitative/Measurement Data
-The resulting data are a set of numbers
.Height
.Weight
.Price
.Unit Sales
-Discrete or Continuous
Primary Data
Collected for the purpose of a specific research project
-Survey/focus group
-Experimentation
-Observation
-Conjectured behavior/action
Secondary Data
Already exist, collected for reasons other than the specific project of interest.
-Collected in an uncontrolled environment
-Actual behavior/action
Syndicated Data
Collected, cleaned, compiled and analyzed according to a standara procedure by a 3rd party and then sold to interested parties.

Ex:
-Store scanner data: POS data
-Household scanner data: Nielsen
Clickstream
The recording of what a computer user clicks on while Web browsing or using another software application.
-IP address of computer
-User name
-Time stamps
-Click path through the website
-Transfer volume
-Shopping cart decisions
-Purchase/No Purchase
Collaborative Filtering
The process of filtering for information or patterns using techniques involving collaboration amoung multiple agents, viewpoints, data sources etc.

The method of making automatic predictions about the interests of a user by collecting taste information from many users
Discrete Data
if there are only a finite number of values possible or if there is a space on the number line between each 2 possible values.

Ex. In order to obtain a taxi license in Las Vegas, a person must pass a written exam regarding different locations in the city. How many times it would take a person to pass this test is also an example of discrete data. A person could take it once, or twice, or 3 times, or 4 times, or… . So, the possible values are 1, 2, 3, … . There are infinitely many possible values, but if we were to put them on a number line, we would see a space between each pair of values.
Continuous Data
makes up the rest of numerical data. This is a type of data that is usually associated with some sort of physical measurement.

Ex. The height of trees at a nursery is an example of continuous data. Is it possible for a tree to be 76.2" tall? Sure. How about 76.29"? Yes. How about 76.2914563782"? You betcha!

The possibilities depends upon the accuracy of our measuring device.
One general way to tell if data is continuous is to ask yourself if it is possible for the data to take on values that are fractions or decimals. If your answer is yes, this is usually continuous data.
Histogram
A summary graph showing a count of the data points falling into various ranges.
Mean
average value for a variable, sum of all values divided by the number of observations for that variable
Median
middle observation when data are arranged from lowest to highest.
Mode
The most common value in a series of data
Weighted Average
each value is appropriately ‘weighted’ according to its relative importance – weighted values are then summed
Moving Average
used to track dynamic changes to average values (the mean value for a certain period of time)
Dispersion: Range
minimum and maximum
Standard Deviation
measures the amount of dispersion or variability in the data
How different “on average” will a single observation be from the mean of the sample?

To Find:
Average = Sum(x)/n
Variance = Sum of (Square Deviation)/n-1
STD= Square root of Variance
Empirical Rule
normal distribution (the 68-95-99.7 rule)
-There’s a 95% chance that a normally distributed variable will fall within 2 SDs (plus or minus)of its mean.
Index Numbers
an index is simply a series of numbers where variation captures % changes with respect to some point of reference (base)
-typically a conversion of some sort of average
Broad/aggregate/general implications
-Can be used to differentiate between “real” and “nominal” values
-Can also be used to track general changes in preferences, outlooks, or performance
-When in cross tab it is the constant number things are being compared to.
Correlation
A measure of the strength and direction of a linear association between two variables.

From -1 to +1

The closer the coefficient is to either −1 or 1, the stronger the correlation between the variables

Positive coefficient= Positive linear relationship
Negative coefficient = Negative Linear relationship

Doesn’t imply causation
-- Symmetrical, non-directional
P Value
How believable the coefficient value is

The p-value is the probability of observing a t-statistic that large or larger in magnitude given the null hypothesis that the true coefficient value is zero.

If the p-value is greater than 0.05 this means that the coefficient may be only "accidentally" significant.

A variable is entered into the model if its associated significance level is less than this P-value.
Scatter Plots
The more scattered in pattern, the smaller the correlation coefficient is.
Correlation Coefficient
Value between -1 and 1
Regression
Attempts to describe the dependence of a variable on one or more explanatory (independent ) variables

Implicitly assumes that there is a one-way causal affect from the independent variables to the dependent variable
Y=a+b*x
y=dependent variable
x=independent variable
Parameters (Marginal Effect)
The unit change in the dependent variable due to a one unit change in the independent variable.
R-Squared
measures the proportion of variation in the dependent variable that is explained by variation in the independent variables
Slope
The change in Y (dependent variable) for each unit X (independent variable) changes

On regression CCount B
Intercept
elevation

On regression Constant B
P-Value
probability of rejecting a true hypothesis
– This value must be really small for our parameter
estimates to be statistically significant

sig. on regression model
Difference between correlation & regression
With correlation you don't have to think about cause and effect. You simply quantify how well two variables relate to each other. With regression, you do have to think about cause and effect as the regression line is determined as the best way to predict Y from X.
Elasticity
Measures the % change in “A” due to a 1% change in “B”

Is the ratio of the % change in “A” to the % change in “B”
Own Price Elasticity of Demand
The % change in demand for “A” due to a 1% change in the price of “A” (all other factors held constant)

Slope*Average Price/Average sales
Cross Price Elasticity of Demand
The % change in demand for “A” due to a 1% change in the price of “B” (all other factors held constant)
Price Elastic
x > 1
quantity changes faster than price 
when the price of a good goes up, consumers will buy a great deal less
when the price of that good goes down, consumers will buy a great deal more
Price Inelastic
x < 1
quantity changes slower than price.
changes in price have little influence on demand
e.g., water
Unit Elastic
x = 1
quantity changes at the same rate as price
Substitute Goods Cross-Price Elasticity
CPE>0
Complementary Goods Cross-Price Elasticity
CPE<0
Independent Goods Cross-Price Elasticity
CPE=0
Cannibalization
competition (demand substitution) between products produced by the same company
Optimization
Maximizing or minimizing an objective function within a given set of constraints and a set of given set of resources.
Objective Function
Mathematical statement that relates variables of interest to some objective or goal we are interested in
Constraints
Limitations on the range of the values that certain variables can take

Positive values: demand and supply must be positive
Systems Approach for Optimization
Find Problems
-Include all stakeholders Use nondirective interviewing

Agree on problems
-Include all stakeholders Brainstorm problem statements Break into manageable pieces

Generate Alternative Solutions

Gain Commitment

Obtain Feedback
Analog Comptuter
Sets up a system that is similar to the system of interest
• Numbers are represented as a continuous range of voltages
• Operates as a simulator conducting many
AAbacus
operations simultaneously

-Smooth
Digital Computer
Works with discrete values (typically these are 0 and 1)
• Performs one arithmetic operation at a time
• Relies upon a program capable of executing sequences of instructions and conditional

-Stair Stepper
Binary Signals
have only two states
ex: on or off
Packet
a formatted block of data carried by a computer network

Three Parts
-Header
-Payload
-Trailer
Packet Technologies
convert voice, video, and data into packets that can be transmitted together over a single, high – speed network.
Information System
A set of interrelated components that collects, processes, stores, analyzes, and disseminates data and information for a specific purpose.
Parts of an IS
-Hardware
-Software
-Network
-Procedures
-People
Data Mining
a process that uses a variety of data analysis tools to discover patterns and relationships in data that made be used to make valid predictions.
Data Mining is different from OLAP
OLAP helps to verify patterns not discover them
Data Mart
sections of data extracted from a data warehouse
-logical rather then physical subset of data
Data Warehouse
A database that collects business information from many sources in the enterprise
Covers all aspects of the company’s processes products, and customers
Multi‐dimensional view
Metadata
Data about data
Data cube
a multidimensional matrix that lets users explore and analyze a collection of data from many different perspectives.
Types of Data Mining Tasks
Description
-Find human‐interpretable patterns that describe the data
– Means, standard deviations, crosstabs, etc.

Prediction
– Use some variables to predict unknown or future values of other variables
Clustering
A way to segment data into groups that are not previously defined
•Given a set of data observations (e.g.customers) –Find clusters such that:
– Observations in one cluster are more similar to one another
– Observations in separate clusters are less similar to one another

EX: Types of shoppers at a store
Association Rule Discovery
-Market basket analysis: develop association rule between product purchases
-Shows how typical customers navigate through
-Suggests tie‐in “tricks,” e.g., run sale on diapers and
Contents
-Helps with inventory management
raise the price of beer
Sequence Discovery
Find rules that predict strong sequential dependencies among different events (buy tennis racquet >>> buy tennis balls)
-Rules are formed by first discovering patterns.
-Event occurrences in the pattern are governed by timing constraints

-example 75% of customers who buy a tennis racket will by at least one tube of tennis balls within the month
Classification
-Identify groups that have common characteristics
-Based on some qualitatively defined attribute that is previously determined

Applications
-Target Marketing
-Fraud Detection
-New product demand production
Harrah: middle‐aged and senior adults, former teachers, doctors, bankers, and machinists
Regression (predictive)
Derives the relationship between a single continuous variable and a host of predictors
EX: Price and quantity of sales data