• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/36

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

36 Cards in this Set

  • Front
  • Back
ANOVA
ANOVA is a broad and flexible set of methods for testing hypotheses about means.

The simplest form is one-way ANOVA to compare two or more means. It analyzes the significance of a relationship between an interval variable (Y) and a categorical variable (X).

the general logic of ANOVA is concerned with breaking the total variation in the dependent variable into two parts: the variation that occurs within each independent variable group, and the variation that occurs between independent variable groups. The ratio of between- to within-groups variances follows approximately an F distribution.

One-way ANOVA examines how the means of a measurement (interval) variable (or Y) vary across the categories of a second variable (or X).

Are the means between groups significant? Do the sample differences imply different population means? --> ANOVA

Uses F-statistic
standardized slope (b*)
Therefore we may not compare slopes in a single multivariate equation unless their X variables are based on the same metric; if the metrics are different, we can’t compare them.

The value of b*i estimates the partial effect on Y of Xi in standard deviation units.
For every one standard deviation change in Xi, there is b*i amount of standard deviation change in Y, holding constant the effects on Y of the other Xi‘s.
Properties of b*
1. Unlike the Pearsonian correlation coefficient, r, the standardized regression coefficient, b*i, does not necessarily fall between -1.0 and +1.0. Be on the lookout for cases when b*i >1. It usually signifies one sort of problem or another.
2. Since a standardized regression coefficient, b*i, is a multiple of the unstandardized regression coefficient bi, one of the two will equal 0 whenever the other one does. Hence, the t-test of H0: Beta*i = 0 is equivalent to the t-test of H0: Betai = 0. Thus it is unnecessary to study, or to present, separate inference procedures for the sample b*i coefficients because the inferences for them are the same as the inferences for the sample bi coefficients. In research reports don’t report significance results for both, only for the bi coefficients.
3. Don’t compare values of b*i across different samples. This is because in estimating the same equation in different samples, the value of b*i, unlike the value of the unstandardized b slope, can change merely because the variance of X changes.
Interpretation of Standardized Slope (b*)
For every one standard deviation change in Xi, there is b*i amount of standard deviation change in Y, holding constant the effects on Y of the other Xi‘s
How to detect Multicollinearity
5. They examine the bivariate correlations among each pair of independent variables, looking for correlations, oftentimes, of around .6 or .7 or higher (remember that when r = .7, r2 = .49). Scatterplots of one independent variable on the other are also helpful. Then if no intercorrelation is found, one might be tempted to conclude that multicollinearity is not a problem.
 
But, even this approach (#5 above) also is not fully satisfactory because it fails to take into account the relationship of one independent variable with all the other independent variables. It is possible for instance to find no “large” bivariate correlations among pairs of the independent variables, even though one of the independent variables is a nearly perfect linear combination of the remaining independent variables. In such a case, multicollinearity would still be a problem.
6. So, one very preferred method of assessing multicollinearity is to regress each independent variable on all the other independent variables. When any of the R2’s is really high, maybe even near 1.0, there is high multicollinearity. In fact, the largest of these R2’s serves as an indicator of the amount of multicollinearity which exists.
Multicollinearity
when variables are perfectly correlated with one another

more likely to have strong multicollinearity; when strong multicollinearity becomes extreme, serious estimation problems can occur; the parameter estimates become unreliable.

high multicollinearity creates estimation problems b/c it produces large variances for the slope estimates and consequently large standard errors.
Symptoms of Multicollinearity
 1. A substantial R2 with statistically insignificant coefficients.
 2. Regression coefficients which change greatly in value when independent variables are dropped or added to the equation.
 3. A third, less sure, symptom involves one’s “suspicion” (based on theory and hunches and prior applications, etc.) about the magnitudes of the coefficients.
 4. A coefficient with the “wrong” sign. This last symptom, however, is a little feeble, because often our knowledge of what the “right” sign should be may be lacking.
Bivariate Regression
statistical technique that involves fitting a line to a scatter of points; it is the simplest form of regression with one dependent variable and one predictor.
Robust Regression
Robust regression refers to a family of techniques used to estimate the parameters (i.e., the slopes and intercepts) of multiple regression equations. In statistics, the use of the adjective “robust” refers to accuracy, even when the underlying assumptions are violated.

An estimator, such as a slope or intercept, is said to be robust if it is not sensitive to unusual data points, that is, outliers. For instance, I have mentioned in an earlier lecture that the median is said to be robust because it is much less sensitive to unusual data points than is the mean. The mean is not deemed to be as robust because it is indeed sensitive to outlying points.
OLS vs Robust
Although OLS is an efficient estimator given normally distributed errors, it loses efficiency when error distributions have heavier than normal tails. This is a common occurrence in the social sciences.

Hence we often face situations in which OLS performs poorly. Alternative methods are hence needed.

Robust regression is an alternative method to Least Squares Regression. When OLS assumptions about the distribution of the residuals are satisfied, robust techniques yield estimates which are unbiased (like OLS estimates) and which are only slightly less efficient than OLS estimates.

When the distribution of the residuals includes outliers, and/or is nonnormal with heavy tails, robust techniques yield estimates which are more efficient than OLS estimates.

OLS minimizes errors that have been subjected to a nonlinear transformation (the errors are squared).

Robust Regression minimizes errors that have been subjected to a different transformation than squaring (OLS).

When running regression, run both OLS and Robust. If coeffs are basically the same, stick with the OLS model. If there are serious differences between the coeffs and SEs in OLS & robust models, use the robust results.

Discrepancies between OLS & robust estimates of coeffs and SE will suggest the effects on OLS of outliers and warn you that OLS may be untrustworthy.

OLS & Robust complement each other. OLS is simpler and preferable if both OLS ad robust methods produce the same result. The larger the discrepancy, the more you ought to lean toward the robust.

Rule of thumb (Hamilton): we might check whether any of the OLS coeffs are more than one (robust) SE from the corresponding robust coeff.
Logistic Regression
DICHOTOMOUS DEPENDENT VARIABLE

OLS is not feasible for a dichotomous variable for several reasons:

One, it can be shown that the variance of the errors will not be constant; the model will thus be heteroscedastic.

Two, it can also be shown that the errors will not be normally distributed.

Three, since the OLS model may predict values of Y that are negative and or that are greater than 1, the above model could result in nonsensical predictions.

Since p is a probability, it is restricted to taking on values between 0 and 1.

Hence, modeling p with the logistic function is equivalent to fitting a linear regression model where the continuous outcome (i.e., dependent variable), Y, has been replaced by the logarithm of the odds of success, i.e., the logit.

Instead of assuming that the relationship between p and Xi is linear, we now assume instead that the relationship between the logarithm of the odds of success, i.e., ln[p/(1-p)], and Xi is linear.

This kind of a model is fitted with the technique known as logistic regression.
Multinomial Logistic Regression
NOMINAL DEPENDENT VARIABLE
categorical, but not ordered
[polytomous logistic regression]

If the dependent categorical variable is not ordered, but is nominal, we can still use logistic regression. In the multinomial logit model (MNLM), logits are formed from contrasts of non-redundant category pairs of the dependent variable. Each logit is then modeled in a separate equation.

In the multinomial logit model, one estimates a set of logit coefficients for each of the outcomes of the dependent variable.

When you estimate an MNLM you need to decide which of the dependent variable outcomes you want to use as a base (i.e., contrast) category. Once you make this decision, you then set this category to zero. (If you don't tell Stata which one to use, Stata will decide.)
Hazard Rate
refers to the rate at which the event of interest occurs; comes from the biomedical sciences' use of survival analysis (surviving the hazard of a death)
Event History Analysis (EVA)
Concerned with the patterns and correlates of the occurrences of events over time. the occurrence of an event assumes a preceding time interval that represents the nonoccurrence of the vent.

By definition, the occurrence of an event assumes a preceding time interval that represents the nonoccurrence of the event. Specifically, a certain time period or duration of nonoccurrence must exist in order for an occurrence to be recognized as an "event."

Event history analysis (EVA) is really the analysis of the duration data, which represent the period of nonoccurrence of a given event. Thus the term "analysis of duration data" is used to refer to a broad range of techniques, including those approaches used in EVA.

Given this distinction between the risk and non-risk periods, EVA can be defined either as the analysis of the duration for the nonoccurrence of an event during the risk period, or as the analysis of rates of the occurrence of the event during the risk period.

The rate typically varies with time and among groups; and it is the rate that we wish to model in EVA, i.e., treat as the dependent variable. The rate when attached to a particular moment in time is referred to as a hazard rate, or as a transition rate. Hence, we frequently use the term hazard analysis synonymously with EVA.
EVA Advantages
1: capacity to deal with certain types of censored observations

2: capacity to handle in the model both time-dependent (time-varying) covariates and time-independent covariates

[covariate = independent variable]
Direct Standardization
The adjustment of a summary rate, such as CDR, for a population in question

Found by computing a weighted average of group specific rates for the population in questions where the weights consist of specific groups (ie. The proportion in each age group) found in a “standard” population

The direct standardization rate is the rate that a population would have if its age structure was the same as the one of the standard population
dissimilarity index
An index with a range from 0 to 1 that became an accepted measure of segregation due to Duncan in 1955

Measure the distributional equality of two variables (ie: geographic distribution of populations, distribution of income across population) Formula:
Hyper Segregation
Term developed by Massey and Denton to refer to a situation where a population experiences high levels of segregation on several dimensions of segregation simultaneously
Net Migration
In-migration + Out-Migration in a given area over a given period of time

Rate is net migration/mid year pop x k (k = 1000 or 100)
Population Projection
Refers to the determination of counts of a population or sub population for a group for a period in the future
Proximate Determinants of Fertility
Developed by Bongaarts

The behavioral and biological variables that directly influence fertility and are distinct from other types of variables because all the others (family planning, SES, attitudinal) operate through them to influence fertility

Four major proximate determinants are:
1. Proportion of women married
2. Contraceptive use
3. Induced abortion
4. Postpartum infecundability

The observed level of fertility in a population depends on the net balance of ALL the pd’s

If they had no limiting effect, fertility would rise to an average limit of 15.3 births per woman-total fecundity

The four principal proximate determinants are considered inhibitors of fertility.

By multiplying TF by each of the indexes of the PDs, the TFR is obtained
TFR = Cm x CC x Ca x Ci x TF
Proximate Determinants of Fertility (3 secondary determinants)
1. Fecundability: Frequency of intercourse
2. spontaneous intrauterine mortality: risk reflects the fact that many conceptions end in miscarriages, spontaneous abortions or stillbirths
3. prevalence of permanent sterility: reflects couple may become sterile before the woman reaches menopause
“West” model Life Tables
1 of the 4 sets of Coale and Demeny model population life tables that represent the age patterns of mortality of four different ‘regions’ of the world

Based on residual tables

Recommended as the first choice to represent mortality in countries where lack of evidence prevents a more appropriate choice of model
Princeton Fertility Indexes
1) Index of Fertility
The If index compares the number of live births of all women in the population with the fertility schedules of the Hutterites.

2) Index of Marital Fertility
Ig compares the number of live births of married women with the fertility schedules of the Hutterites.

3) The Index of Nonmarital Fertility
The Ih index compares the number of live births to unmarried women with the fertility schedules of the Hutterites.

4) The Index of Marriage
The Im index is a measure of the proportion married; it is the ratio of the number of births currently married women would experience if subject to Hutterite fertility, to the number of births all women would experience if subject to Hutterite fertility.

The additional indexes have been created by others:
the Id index (of divorce), the Is index (of singlehood), and the Iw index (of widowhood).
Residential Segregation
is the degree to which two or more groups live separately from one another in different parts of the urban environment.

Evenness refers to the differential distribution of two social groups among areal units in a city.

Residential exposure refers to the degree of potential contact, or the possibility of interaction, between minority and majority group members within geographic areas of a city.

Concentration refers to relative amounts of physical space occupied by a minority group in the urban environment. Groups that occupy a small share of the total area in a city are said to be residentially concentrated.

Centralization is the degree to which a group is spatially located near the center of an urban area.

Clustering refers to the extent to which areal units inhabited by minority members adjoin one another, or cluster, in space.
N-way ANOVA
N-way ANOVA generalizes the approach in one way ANOVA to deal with two or more categorical X variables.

The same logic is followed of breaking the total variation in the dependent variable into two parts: the variation that occurs within each independent variable group, and the variation that occurs between independent variable groups.

Except in N-way ANOVA, two or more categorical variables are used.
ANOCOVA
ANOCOVA extends N-way ANOVA to encompass a mix of categorical and continuous X variables. We accomplish this in Stata via the anova command, but we must specify which variables are continuous.
Multiple Regression
Allowing for more than one independent (predictor) variable
IRLS (iteratively reweighted least squares) [Robust Regression]
In the IRLS approach, parameter estimates (i.e., the a and the b’s) are first estimated by OLS regression. Any observations so influential that they have Cook’s D values greater than 1 are automatically withdrawn from the sample after this first step.

The regression equation is then re-estimated using weighted least squares (WLS) regression. In WLS, cases are weighted according to the "case weights" just described. Cases with large residuals in the first equation have smaller "case weights." Thus, they will tend to exert less influence on the parameter estimates obtained by the weighted regression.

New residuals are then computed based on the new regression results and the process is repeated, i.e., reiterated. These "iterations" continue until the "case weights" and parameter estimates stabilize.

There are many kinds of robust regression. Each stands as an alternative to OLS regression. All regression techniques minimize errors of prediction that have been subjected to a transformation.

OLS minimizes errors that have been subjected to a nonlinear transformation (i.e., the errors are squared).

Robust regression techniques minimize errors that have been subjected to a different transformation than squaring (which is the one OLS uses).
Ordered Logistic Regression
ORDERED CATEGORICAL DEPENDENT VARIABLE
ordinal dependent variable
[polytomous logistic regression]

Ordinal logistic regression, also known as ordered logit, is used to estimate relationships between an ordinal dependent variable and a set of X variables. An ordinal variable is a variable that is categorical and ordered; for instance, “poor,” “good,” and “excellent” might be answers to a question about one’s current health status, or about the repair status of one’s car, or about one’s GPA.

“ordinal variables are often coded as consecutive integers from 1 to the number of categories. Perhaps because of this coding, it is tempting to analyze ordinal outcomes with the linear regression model (OLS). However, an ordinal dependent variable violates the assumptions of the ... (OLS model), which can lead to incorrect conclusions ... Accordingly, with ordinal outcomes it is much better to use models that avoid the assumption that the distances between categories are equal.”
Poisson Regression
the number of events (the dependent variable) is a nonnegative integer; it has a Poisson distribution with a conditional mean that depends on the characteristics (the X variables) of the individuals

The Poisson regression model is a nonlinear model, predicting for each individual the number of times, μ, that the event has occurred. The X variables are related to μ nonlinearly. This is an important point to remember.
Negative Binomial Regression
The Poisson regression model rarely fits in practice since in most applications the variance of the count data is greater than the mean.

Often, if there is overdispersion, the Poisson estimates will be consistent, but will be inefficient. The standard errors will be biased downwards, resulting in spuriously large z-values. Thus if there is overdispersion in the scientific productivity data analyzed, the z-tests will tend to over estimate the significance of the X variables.

Statisticians thus recommend that when there is overdispersion in the count data that the dependent variable be estimated with Negative Binomial regression; this alternate approach is used when the count event that is being estimated has extra-Poisson variation, that is, overdispersion.
Zero-inflated Count Models
Sometimes, there will be a large number of zeros in the count data; and, sometimes, you may be able to argue that the zeros are not all the same.

The Poisson and Negative binomial regression models may not account satisfactorily for the excess zeros. Zero-inflated count models respond to the failure of the PRM and the NBRM to account for the excess zeros in the data; zero-inflated count models change “the mean structure to allow zeros to be generated by two distinct processes”

Ex: Say, you wish to model the number of fish each visitor to a national recreation park catches. A large number of visitors may catch zero fish because they do not fish, as opposed to visitors who fish but are unsuccessful and catch zero fish. You thus want to model whether or not a person fishes, with a number of X-variables related to fishing activity; and you also want to model how many fish a person catches depending on a number of X-variables having to do with success at fishing. So, there are two groups of people who catch 0 fish.
Zero-truncated Count Models
A “second type of problem with zeros occurs when observations with outcomes equal to zero are missing from the sample because of the way the data were gathered”.

Say, you fill out a warranty survey when you buy a television set about how many television sets you have. This example would lead to the respondents having at least a count of one on the various items.

L/F tells that the zero-truncated model starts with the Poisson model. But since the counts are truncated at zero, we want to compute the probability “for each positive outcome given that we know that the outcome is greater than zero”.

Our major interest thus is with predicting the count for persons who have at least one count, that is, for those persons for whom the count is positive. L/F also tell us that “with truncated data, ... the adverse effects of overdispersion are worse” than when the sample is not truncated.
Cox Models
partial likelihood (PL) estimation method that is probably the most popular EVA method used in demography and sociology, that of the Cox proportional hazards method.

--the main distinction is that ML is a product of the likelihoods for all individuals in the sample; PL is a product of likelihoods for all events that are observed to occur -- so you may have 10 persons in the sample, but if the event only occurs for five persons, with the other five persons being censored, then in the case of PL only 5 likelihoods are constructed
Multilevel Modeling
In the social sciences, our concepts and data structures are often hierarchical. By this I mean that the dependent variables are intended to describe the behavior of individuals.

But the individuals themselves are grouped into larger units, such as families or neighborhoods, and so forth. If our theories state, and/or if we believe, that the outcome behavior will be influenced by both the person’s characteristics and those of the context, then the independent variables we should be interested in employing should refer to the characteristics of both the individuals and the higher order units.