Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
65 Cards in this Set
- Front
- Back
Data
|
Factual information used as a basis for reasoning, discussing, or calculation
|
|
Qualitative/Categorical Data
|
-Nonnumeric
-Dummy or Indicator variables -The resulting data are merely codes representing the various categories EX: -Male or Female -On Sale or Not On Sale -Small, Medium or Large |
|
Quantitative/Measurement Data
|
-The resulting data are a set of numbers
.Height .Weight .Price .Unit Sales -Discrete or Continuous |
|
Primary Data
|
Collected for the purpose of a specific research project
-Survey/focus group -Experimentation -Observation -Conjectured behavior/action |
|
Secondary Data
|
Already exist, collected for reasons other than the specific project of interest.
-Collected in an uncontrolled environment -Actual behavior/action |
|
Syndicated Data
|
Collected, cleaned, compiled and analyzed according to a standara procedure by a 3rd party and then sold to interested parties.
Ex: -Store scanner data: POS data -Household scanner data: Nielsen |
|
Clickstream
|
The recording of what a computer user clicks on while Web browsing or using another software application.
-IP address of computer -User name -Time stamps -Click path through the website -Transfer volume -Shopping cart decisions -Purchase/No Purchase |
|
Collaborative Filtering
|
The process of filtering for information or patterns using techniques involving collaboration amoung multiple agents, viewpoints, data sources etc.
The method of making automatic predictions about the interests of a user by collecting taste information from many users |
|
Discrete Data
|
if there are only a finite number of values possible or if there is a space on the number line between each 2 possible values.
Ex. In order to obtain a taxi license in Las Vegas, a person must pass a written exam regarding different locations in the city. How many times it would take a person to pass this test is also an example of discrete data. A person could take it once, or twice, or 3 times, or 4 times, or… . So, the possible values are 1, 2, 3, … . There are infinitely many possible values, but if we were to put them on a number line, we would see a space between each pair of values. |
|
Continuous Data
|
makes up the rest of numerical data. This is a type of data that is usually associated with some sort of physical measurement.
Ex. The height of trees at a nursery is an example of continuous data. Is it possible for a tree to be 76.2" tall? Sure. How about 76.29"? Yes. How about 76.2914563782"? You betcha! The possibilities depends upon the accuracy of our measuring device. One general way to tell if data is continuous is to ask yourself if it is possible for the data to take on values that are fractions or decimals. If your answer is yes, this is usually continuous data. |
|
Histogram
|
A summary graph showing a count of the data points falling into various ranges.
|
|
Mean
|
average value for a variable, sum of all values divided by the number of observations for that variable
|
|
Median
|
middle observation when data are arranged from lowest to highest.
|
|
Mode
|
The most common value in a series of data
|
|
Weighted Average
|
each value is appropriately ‘weighted’ according to its relative importance – weighted values are then summed
|
|
Moving Average
|
used to track dynamic changes to average values (the mean value for a certain period of time)
|
|
Dispersion: Range
|
minimum and maximum
|
|
Standard Deviation
|
measures the amount of dispersion or variability in the data
How different “on average” will a single observation be from the mean of the sample? To Find: Average = Sum(x)/n Variance = Sum of (Square Deviation)/n-1 STD= Square root of Variance |
|
Empirical Rule
|
normal distribution (the 68-95-99.7 rule)
-There’s a 95% chance that a normally distributed variable will fall within 2 SDs (plus or minus)of its mean. |
|
Index Numbers
|
an index is simply a series of numbers where variation captures % changes with respect to some point of reference (base)
-typically a conversion of some sort of average Broad/aggregate/general implications -Can be used to differentiate between “real” and “nominal” values -Can also be used to track general changes in preferences, outlooks, or performance -When in cross tab it is the constant number things are being compared to. |
|
Correlation
|
A measure of the strength and direction of a linear association between two variables.
From -1 to +1 The closer the coefficient is to either −1 or 1, the stronger the correlation between the variables Positive coefficient= Positive linear relationship Negative coefficient = Negative Linear relationship Doesn’t imply causation -- Symmetrical, non-directional |
|
P Value
|
How believable the coefficient value is
The p-value is the probability of observing a t-statistic that large or larger in magnitude given the null hypothesis that the true coefficient value is zero. If the p-value is greater than 0.05 this means that the coefficient may be only "accidentally" significant. A variable is entered into the model if its associated significance level is less than this P-value. |
|
Scatter Plots
|
The more scattered in pattern, the smaller the correlation coefficient is.
|
|
Correlation Coefficient
|
Value between -1 and 1
|
|
Regression
|
Attempts to describe the dependence of a variable on one or more explanatory (independent ) variables
Implicitly assumes that there is a one-way causal affect from the independent variables to the dependent variable |
|
Y=a+b*x
|
y=dependent variable
x=independent variable |
|
Parameters (Marginal Effect)
|
The unit change in the dependent variable due to a one unit change in the independent variable.
|
|
R-Squared
|
measures the proportion of variation in the dependent variable that is explained by variation in the independent variables
|
|
Slope
|
The change in Y (dependent variable) for each unit X (independent variable) changes
On regression CCount B |
|
Intercept
|
elevation
On regression Constant B |
|
P-Value
|
probability of rejecting a true hypothesis
– This value must be really small for our parameter estimates to be statistically significant sig. on regression model |
|
Difference between correlation & regression
|
With correlation you don't have to think about cause and effect. You simply quantify how well two variables relate to each other. With regression, you do have to think about cause and effect as the regression line is determined as the best way to predict Y from X.
|
|
Elasticity
|
Measures the % change in “A” due to a 1% change in “B”
Is the ratio of the % change in “A” to the % change in “B” |
|
Own Price Elasticity of Demand
|
The % change in demand for “A” due to a 1% change in the price of “A” (all other factors held constant)
Slope*Average Price/Average sales |
|
Cross Price Elasticity of Demand
|
The % change in demand for “A” due to a 1% change in the price of “B” (all other factors held constant)
|
|
Price Elastic
|
x > 1
quantity changes faster than price when the price of a good goes up, consumers will buy a great deal less when the price of that good goes down, consumers will buy a great deal more |
|
Price Inelastic
|
x < 1
quantity changes slower than price. changes in price have little influence on demand e.g., water |
|
Unit Elastic
|
x = 1
quantity changes at the same rate as price |
|
Substitute Goods Cross-Price Elasticity
|
CPE>0
|
|
Complementary Goods Cross-Price Elasticity
|
CPE<0
|
|
Independent Goods Cross-Price Elasticity
|
CPE=0
|
|
Cannibalization
|
competition (demand substitution) between products produced by the same company
|
|
Optimization
|
Maximizing or minimizing an objective function within a given set of constraints and a set of given set of resources.
|
|
Objective Function
|
Mathematical statement that relates variables of interest to some objective or goal we are interested in
|
|
Constraints
|
Limitations on the range of the values that certain variables can take
Positive values: demand and supply must be positive |
|
Systems Approach for Optimization
|
Find Problems
-Include all stakeholders Use nondirective interviewing Agree on problems -Include all stakeholders Brainstorm problem statements Break into manageable pieces Generate Alternative Solutions Gain Commitment Obtain Feedback |
|
Analog Comptuter
|
Sets up a system that is similar to the system of interest
• Numbers are represented as a continuous range of voltages • Operates as a simulator conducting many AAbacus operations simultaneously -Smooth |
|
Digital Computer
|
Works with discrete values (typically these are 0 and 1)
• Performs one arithmetic operation at a time • Relies upon a program capable of executing sequences of instructions and conditional -Stair Stepper |
|
Binary Signals
|
have only two states
ex: on or off |
|
Packet
|
a formatted block of data carried by a computer network
Three Parts -Header -Payload -Trailer |
|
Packet Technologies
|
convert voice, video, and data into packets that can be transmitted together over a single, high – speed network.
|
|
Information System
|
A set of interrelated components that collects, processes, stores, analyzes, and disseminates data and information for a specific purpose.
|
|
Parts of an IS
|
-Hardware
-Software -Network -Procedures -People |
|
Data Mining
|
a process that uses a variety of data analysis tools to discover patterns and relationships in data that made be used to make valid predictions.
|
|
Data Mining is different from OLAP
|
OLAP helps to verify patterns not discover them
|
|
Data Mart
|
sections of data extracted from a data warehouse
-logical rather then physical subset of data |
|
Data Warehouse
|
A database that collects business information from many sources in the enterprise
Covers all aspects of the company’s processes products, and customers Multi‐dimensional view |
|
Metadata
|
Data about data
|
|
Data cube
|
a multidimensional matrix that lets users explore and analyze a collection of data from many different perspectives.
|
|
Types of Data Mining Tasks
|
Description
-Find human‐interpretable patterns that describe the data – Means, standard deviations, crosstabs, etc. Prediction – Use some variables to predict unknown or future values of other variables |
|
Clustering
|
A way to segment data into groups that are not previously defined
•Given a set of data observations (e.g.customers) –Find clusters such that: – Observations in one cluster are more similar to one another – Observations in separate clusters are less similar to one another EX: Types of shoppers at a store |
|
Association Rule Discovery
|
-Market basket analysis: develop association rule between product purchases
-Shows how typical customers navigate through -Suggests tie‐in “tricks,” e.g., run sale on diapers and Contents -Helps with inventory management raise the price of beer |
|
Sequence Discovery
|
Find rules that predict strong sequential dependencies among different events (buy tennis racquet >>> buy tennis balls)
-Rules are formed by first discovering patterns. -Event occurrences in the pattern are governed by timing constraints -example 75% of customers who buy a tennis racket will by at least one tube of tennis balls within the month |
|
Classification
|
-Identify groups that have common characteristics
-Based on some qualitatively defined attribute that is previously determined Applications -Target Marketing -Fraud Detection -New product demand production Harrah: middle‐aged and senior adults, former teachers, doctors, bankers, and machinists |
|
Regression (predictive)
|
Derives the relationship between a single continuous variable and a host of predictors
EX: Price and quantity of sales data |