Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
30 Cards in this Set
- Front
- Back
Datum
|
an item of information
|
|
Data Warehouse
|
large data base of information collected by company
|
|
Data mining
|
using data to make predictions or make decisions
|
|
Metadata- contains all information about data
|
Who-Specific case, what data is describing
What-what about case was recorded/ measured When-Time? Why- reason for examining data Where- actual location |
|
Rows- cases
Columns- variable |
Respondent- individuals in survey
Subjects/participants- people in experiment Experimental units- if not people in experiment |
|
Relational database
|
two or more tables linked together so info can be merged across them.
-adds clarity -keep track of transactions better instead of having one huge data table with many columns for just one customer |
|
Categorical Vs. Quantitative
|
categorical- can't use math, doesn't have specific numerical units. Nominal- categorical Ordinal- intrinsic order involved such as Freshman, Sophomore, Junior, Senior.
Quantitative- numerical, it has UNITS ; PERCENTAGES |
|
Identifier variable
|
unique type of categorical, assigned to each individual.
Example- Social security, ID number |
|
Time Series VS Cross Sectional
|
time series- variables measured at regular intervals of time. Ex- Every week, month, year
Cross Sectional- Several variables measured at relatively same point in time. |
|
3 Rules of Sampling
|
1) Make a sample- examine a part of a whole.
2) Randomization ensures every member of population is accounted for 3) Sample size is what matters not size of population |
|
Population
|
entire group of individuals in which we hope to learn from
|
|
Population parameter
|
the valued answer of the population
|
|
Sampling frame
|
what list you are choosing from for the sample
|
|
Sample
|
the subset that responds/ represents the data that is used to learn from
The size of the sample is what matters not the size of the population (as long as sample is representative) |
|
Voluntary Response bias
|
when individuals can choose on their own if they wish to participate in sample.
-People who participate are more likely to polarize on whatever the issue is. |
|
Undercoverage bias
|
when some portion of the population is not sampled at all
|
|
Nonresponse bias
|
large fraction of those sampled failed to respond
|
|
Response Bias
|
when a survey design influences responses
|
|
Sampling error/ variability
Measurement error |
differences in responses between random samples
built in bias of sampling. |
|
Convenience sampling
|
sample consisting of individuals who are conveniently available
|
|
Simple Random Sample
|
a sample drawn so that every possible sample of the size we plan to draw has equal chance of being selected.
|
|
Stratified Random Sampling
|
Put population into homogeneous groups and then use random sampling within each stratum
-ensures sample represents diff groups in population |
|
Cluster Sampling
|
Putting population into clusters at random and perform census within each cluster
-Useful for when you don't have a big list to choose from, ex- getting poles from counties -more practical -saves money |
|
Systematic Sampling
|
selecting individuals in a selected order.
|
|
Multistage Sampling
|
a more complicated form of cluster sampling in which larger clusters are further subdivided into smaller, more targeted groupings for the purposes of surveying.
|
|
Frequency Vs Relative Frequency table
|
Frequency- shows the number of a variable
Relative- shows the percentage of the variable to the whole |
|
Area Principle
|
can't represent data with 2 different dimensions.
-must fix height or width and change other |
|
Contingency table
|
table that shows how an individual is distributed under one variable which is contingent upon another
|
|
Independent
|
no relationship between 2 variables
|
|
Simpson's Paradox
|
when percentages across groups contradict the overall percentages.
- Group A) 90/100= 90% and 10/20=50% Total: 83% - Group B) 19/20=95% and 75/100= 75% Total: 78% |