Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
44 Cards in this Set
- Front
- Back
Population |
a group of items or events that are of interest to a study or experiment |
|
Sample |
A sub-set of the population that will represent the population |
|
Parameter |
an unknown fixed quantity; fixed value that indexes a distribution i.e. estimating parameters with a sample |
|
Variable |
random quantity, has a distribution |
|
probability |
chance to see a certain event with many repetitions of experiment |
|
Distribution |
relative occurrence or frequency of a random variable |
|
statistic |
a function of data, categorized as test statistic or estimator |
|
statistical inference |
process of drawing conclusions from data subject to random variation |
|
Estimate vs. Estimator |
estimate: the value that represents a sample population's point or interval estimator based on sample data estimator: a value, either point or interval, that is used to make inferences about the population ex. sample mean, sample median, confidence intervals |
|
Quantitative |
concept of amount, numerical eg. continuous (no gaps ex. BMI, BWT), discrete (gaps, ex. integers, ER visits) |
|
Qualitative |
concept of attribute, category eg. binary, multinomial, ordinal |
|
correlation |
measures Direction and Strength of relationship |
|
regression methods |
measures direction and form of relationship |
|
correlation coefficient |
measure of linear association between two random variables Y <---> X p = -1 perfect negative linear association p = 0 no association p = 1 perfect positive linear association |
|
covariance |
measure of how much two random variables change together |
|
sample correlation coefficient 1. Name of test? 2. What does this estimate? 3. Ranges for strength of association |
1. Pearson correlation coefficient 2. point estimator of population correlation coefficient 3. 0-.25 poor to none 0.25-0.5 fair 0.5-0.75 fair to good 0.75-1 strong to excellent |
|
what is 'r'? What does it measure? |
'r' represents the correlation coefficient. It measures the strength and direction of a relationship between two random variables |
|
What factor(s) influence 'r' value |
outliers |
|
Does 'r' have units? |
No, it is independent of measurement units; any two units can be compared |
|
Do the variables need to be hierarchically related? |
No, they do not need to be 'predictor' and according 'response' variables |
|
Is 'r' adjusted for other variables? |
No, not for simple pearson correlation coefficient |
|
Inference on p |
need to ask questions on page 11-12 |
|
Spearman Rank Correlation coefficient When is this test used? |
- non-normal, non-parametric test Use for: 1. ordinal variables 2. non-normally distributed variables 3. data with outliers |
|
Regression |
statistical technique for modeling the relationship between variables Identifies relationship between MEAN value of a random variable and corresponding values of one or more independent values Mean/avg Y <--> X |
|
Hills Criteria for causality |
1. strength of association 2. Dose response relationship 3. Temporality 4. Specificity 5. Consistency 6. Repeatability/Coherence 7. Biological plausibility |
|
1. How do you chose what regression method to use? 2. When should each of these regression methods used? a. linear b. logistic c. Poisson / log-linear d. Cox regression |
1. Type of couture variable (Y), e.g. continuous, binary, ordinal, etc. 2. a. continuous variables b. binary variables c. counts (discrete variables) d. time of event (time -> clocks -> cox) |
|
What is the primary purpose for regression methods? |
1. Inference: to study the association between Y and X 2. Prediction: to predict Y for a given X |
|
What others purposes does regression serve? |
1. To adjust for confounders 2. Evaluate interactions between different X variables regarding change in mean response |
|
Inference 1. What kinds of inferences can be made? 2. Describe the functions of each type. |
1. estimation and hypothesis testing 2. Estimation: point/interval; *interpretation of regression coefficients is important HT: answer questions about population; *model parameters must be interpretable for inference to be meaningful |
|
Prediction |
- predict response for unobserved samples - accuracy and precision take precedence over interpretability |
|
What is simple linear regression (SLR)? |
one response variable (Y) vs. one explanatory variable (Covariate, X) to answer: Is Y linearly associated with X? |
|
What is the model for SLR? |
Yi = Bo + B1x1 + e where: Yi = response Bo is intercept B1 is slope x1 is covariate e is random error |
|
What are the conditions for e or 'error'? |
normal distribution with mean zero and constant variance |
|
What is Multiple Linear Regression? |
regression analysis for multiple covariates |
|
What is the model for multiple linear regression? |
Yi = Bo + B1x1 + ... + Bkxk + e where: one response for 'k' variables |
|
What are the three assumptions made when comparing two population? |
1. Independence between popls 2. Normality on distribution of parameter 3. Homoscedasticity: Equal variance between popls |
|
Ex. Simple linear regression
Comparison of mean blood pressure between two populations, two populations in comparison are made based on weight, Y1=150 lbs and Y2=151 lbs 1. What is the response variable and the covariate? 2. How would you represent normal distribution of sample mean blood pressure for each population? |
1. response: mean blood pressure; covariate is weight 2. Y1 ~ N(ux=150, o^2), Y2 ~ N(ux=151, o^2) |
|
How would you represent the population regression line if we were able to collect data from all Rutgers students, obtaining the true mean blood pressure at targeted weights? |
Uylx = Bo + B1x where: Uylx is mean y at a given x-value (ex. mean blood pressure at 153 lbs) Bo is intercept B1x is slope and covariance |
|
Since we are unable to obtain data from the entire Rutgers community, what is the resultant model for estimating mean blood pressure? |
Yi = Uylx + ei where: Yi is the sample population mean blood pressure Uylx is the population mean blood pressure ei is unknown random error |
|
How would you word results indicating a good relationship between the data and the linear model? |
The mean of response variable has a linear relationship with explanatory variable x. |
|
What is the primary assumption about your data when you use linear regression? |
That the relationship between your response and explanatory variable is linear. |
|
What are the assumption for errors? |
1. Normal distribution with mean: 0 and variance o^2 (ei ~ N(0, o^2) 2. errors are independent 3. Same variance |
|
What are the assumptions for a response variable? |
1. Normal distribution with mean (Uylx) and variance o^2 2. Same variance (homoscedasticity) 3. Independence |
|
Are covariates (Xk) fixed or random? |
Both! We are under the assumption that they are fixed for the purposes of simplifying the model. BUT... they may, in fact, be random, due to causes such as... measurement error and natural fluctuations |