Set the Language

We weren't able to detect the audio language on your flashcards. Please select the correct language below.

Front

Back

Flashcards
»
Quant

Quant

by huong.t.le9, Sep. 2011

Subjects: concepts quant

Favorite

Add to folder

Flag

Related Essays

Nonviolent Crimes Case Study
```{r, echo=FALSE, warning=FALSE, message=FALSE} m9<-lm(nonViolent^2 ~ loglargeHH + I(notHSG^2) + logmedRent + I(tPar^2),data=cr4) plot(m9,1) ``` Looking a...
Hospital Regression Analysis
The correlation coefficient between X and Y: r = β x (SD of X divided by SD of Y) β is the regression coefficient, SD X (independent variable) and SD Y (dep...
What Factors Lead To Brand Loyalty?
* Used for prediction and explanation * Variables selected based on theory * Should be able to group the variables into independent and dependant ...
Relationship Between Crime And Crime
Since p-value=0.92>0.05 we can conclude than middle income does not has statistically significant impact on crime rate increase in United State, but for each...
Hypergeometric And Negative Binomial Distribution
The Gaussian distribution and the bell curve are known names for the Normal distribution. Speaking of Normal Distribution, measurements were the earliest use...
Regression Analysis In Healthcare
Known as a statistical tool that investigates relationships among variables, regression analysis seeks to determine the unintended effect of one variable to...
Textual Analysis: Naked Statistics By Charles Wheelan
Using this type of statistic helps us understand how something can be dependent based on another variable. With this statistics we can find out how things af...
Regression Analysis In Public Education
First, the variable Percentage Black Students had a medium negative correlation (r = -0.59) to QDI. A linear regression analysis using Black Student Percenta...
Statistics Review
Descriptive statistics Data collected and organized by the experimenters would be described by measures of central tendency and measures of dispersion. In sc...
Correlation In Psychology
For example, an increase in salary correlates with a higher position. A salary increase does not dictate a higher position at work. These variables are relat...

Shuffle
Toggle On

Toggle Off
Alphabetize
Toggle On

Toggle Off
Front First
Toggle On

Toggle Off
Both Sides
Toggle On

Toggle Off
Read
Toggle On

Toggle Off

Reading...

Front

Card Range To Study

through

Play button

Progress

1/78

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

78 Cards in this Set

Front
Back

	Normal Distribution	3 properties: 1. Symmetrical 2. Unimodal (mean, median and mode are all in the same place at the center of the distribution) 3. Asymptotic (upper and lower tails of the distribution never touch the baseline) Sometimes referred to as a Gaussian distribution
	Population Distribution	the distribution of scores in a population ex: distribution of IQ scores for everyone in a country
	Distribution of a Sample	the distribution of scores in a sample of a given size ex: IQ scores of the students in class, as a sample of the TAMU population
	Sampling Distribution	the distribution of some statistic in all possible samples of a given size ex: mean, or a slope coefficient
	Central Limit Theorem	CLT states that, as sample size (n) becomes large: then a) the sampling distribution of the mean becomes approximately normal, regardless of the shape of the variable’s frequency distribution; b) the sampling distribution will be centered around the variable’s population mean; and c) the standard deviation of this sampling distribution, called its standard error, approaches the variable’s standard deviation in the population divided by the square root of the sample size
	SSE	sum of the squares of the errors SSE = Sum(Yi – Y-hati)2 [square of the TPE]
	R-squared	coefficient of determination =ESS/TSS Indicates the explanatory power of the regression model. It records the proportion of variation in the dependent variable that is explained or accounted for by the independent variable. varies from +1 to 0
	TSS	Total Sum of Squared Deviations Sum(Yi – Y-bar)2 The TSS indicates the total variation of the dependent variable that we want to explain. The total variation may be divided into two parts: the part accounted for or explained by the regression equation (ESS); and that part that the regression equation cannot account for, i.e., the residual part (RSS).
	ESS	Explained sum of squared deviations Sum(Y-hati – Y-bar)2
	RSS	error of residual (unexplained) sum of squared deviations Sum(Yi - Yhati)2
	r (correlation coefficient)	type of standardized version of the slope, and does not depend on units of measurement. The Pearson correlation coefficient, r, is sort of a standardized version of the slope. It is a type of slope for which the value, unlike that of b, does not depend on the units of measurement. The correlation is the value the slope would assume if the measurement units for the two variables are such that their standard deviations are equal.
	properties of r	1. r is only valid when a straight line is a reasonable model for the relationship. It measures the strength of the linear association between X and Y. 2. Unlike the slope b, the value of r must fall between the values of -1.0 and +1.0. 3. r has the same sign as b. We’ve just seen that r equals the slope b multiplied by the ratio of two (positive) standard deviations, so the the sign is preserved. 4. r = +/- 1.0 when all the sample points fall exactly on the prediction line. These would correspond to perfect positive and negative linear associations. 5. The larger the absolute value of r, the stronger the degree of linear association. 6. The value of r does not depend on the variables’ units of measurement. In the earlier example of murder and poverty rates among the 50 states, r = .63, irrespective of whether Y is measured as murders per 1,000,000 population, or as murders per 100,000 population. 7. Unlike the slope b, r treats the two variables symmetrically. The prediction equation using Y to predict X has the same correlation as the equation using X to predict Y. If the murder rate is used to predict the poverty rate, rather than, as in the example, using the poverty rate to predict the murder rate, r = .63 in both (but the b’s would most likely not be the same) .
	t-test	= b/ SEb We want a t-value that has a low probability, meaning that it is unlikely that the sample came from a population where H0 is true. Want value above 1.96 for 0.05 probability.
	Adjusted R-squared	R2(adj.) is an adjustment to the value of R2 in order to obtain more of an unbiased estimate of the R2 coefficient. Statisticians have shown that the value of R2 for any sample drawn from a larger population tends to be slightly biased upwards, that is, it tends to overestimate the value of R2 for the population from which it was drawn. R2 for the population is known as 2, which is not a letter P, but the upper-case Greek letter rho. R2 is slightly biased upwards because the sample data fall closer to the sample prediction equation than to the true population regression equation. This bias is greater if n (the sample size) is small and/or if K (the number of predictors, i.e., independent variables) is large. R2(adj) approaches R2 in value as n increases. If R2 is quite low, the value of R2 (adj) may become negative.
	Multiple Regression Coefficients	The interpretation of the intercept a is easy and is an extension of the bivariate case; a is the value of Y when each independent variable is zero. The interpretation of b requires more attention. The value of the slope bk equals the average change in Y associated with a one unit change in Xk, when the other independent variable(s) are held constant. Therefore, the slope in the multivariate case is sometimes referred to as a partial slope or as a partial regression coefficient.
	Multiple Regression & Spurious Relationships	A major contribution of multiple regression is that it enables us to test to see if a previous bivariate relationship may be spurious, that is, if the previous bivariate relationship is not a real one but is caused by the fact that the two variables in the bivariate equation are both caused by a third variable not included. Consider the demonstrated bivariate association between height and mathematics scores. Taller children perform better on math than shorter children. But if you run a multivariate regression with math scores as Y and both height and age as X variables, the slope of height on math scores usually becomes insignificant.
	F-test	1. The F-test allows us to evaluate the complete regression; it is a global test; it tests the H0 that all the regression coefficients in the real population are zero. The significance level of the F-test also gives us the significance of the R2. If you wish to determine if a specific b coefficient is significant, use the t-test for that coefficient. 2. We can also use the F-test to compare two regression models, one of which is more complex than the other.
	Tolerance	The R2 produced by regressing a particular X variable on the other X variables may then be subtracted from 1. This value is known as the “tolerance” (or the independent variation) of the X variable. If the R2 of an X variable regressed on the other X variables is .29, this means that the X variable has a tolerance of .71, that is, 71% of the variation in that particular X variable is independent of the other X variables in the model. The higher the tolerance of an X variable, the less the presence of a problematic amount of collinearity. Watch out for low tolerances, say < .40 or <.35.
	VIF	Variance inflation factor VIF = reciprocal of the tolerance (1/tolerance) The larger the VIF (the smaller the “tolerance”), the greater the multicollinearity in the model that is caused by the particular X variable. The square root of the VIF of a particular X variable tells you how much larger the standard error for the b coefficient of that X variable is, compared with what it would be if that X variable were uncorrelated with the other X variables in the regression equation.
	Regression Assumptions	I.1. There is no specification error. I. 1a. The relationship between Xi and Yi is linear. I. 1b. No relevant independent (X) variables have been excluded. I. 1c. No irrelevant independent (X) variables have been included. I. 2a. The Y variable is quantitative, continuous and unbounded; the X variables are quantitative or dichotomous; all variables are measured without error. I. 2b. All X variables have nonzero variance, that is, each independent variable has some variation in value. I. 2c. There is not perfect collinearity (i.e., there is no exact linear relationship) between two or more of the X variables I.2d. For each observation, the expected value of the error term is zero. I.2e. The variance of the error term is constant for all values of Xi. Another way of saying this is that the error term is homoscedastic if across each set of values for the k independent variables, the variance is constant at a value sigma2. I.2f. The error terms are uncorrelated, that is, there is no autocorrelation I. 2g. Each independent variable is uncorrelated with the error term I. 2h. The error term, Ei, is normally distributed.
	BLUE	when OLS is most efficient Best Linear Unbiased Estimator
	Normality Assumption	The normality assumption does not mean that all the variables in the regression equation must be normally distributed. The only “variable” that is assumed to have a normal distribution is the error term, which is something we can’t observe directly. But again, nonnormal e distributions often result from badly skewed Y and or X distributions. So here are some ways for appraising whether the X and Y variables have normal distributions. Compare the mean and median; in a normal distribution they are the same. Also look at kurtosis and skewness values (which will equal 3 and 0, respectively in a normal distribution.) Also look at graphs of the distribution of the variables in the model.
	Tukey's Ladder	If you have a non-normal distribution: Powers greater than 1 shift weight to the upper tail of the distribution and thereby reduce negative skew. The higher the power, the stronger this effect. Powers less than 1 pull in the upper tail and may therefore reduce positive skew. The lower the power, the stronger this effect. Natural logs have the effect of bringing in the positive outliers. The higher values are compressed, pulling in the upper tail of the distribution.
	Standard Error	The standard error is the standard deviation of the sampling distribution. SEx-bar = SE / sqrt (N) Symbol for SE = lowercase sigma A small standard error indicates little sample-to-sample variation, so that most b's and a's are close to B (beta) and A (alpha). Large SE indicate the opposite.
	TPE (total prediction error)	= Sum (Yi - Y-hati) = Sum (difference between observed & predicted)
	Slope (b)	average change in Y associated with one unit change in X
	Intercept (a)	indicates the point where the regression line "intercepts" with the Y-axis
	Covariance (sxy)	A useful summary statistic is known as the covariance (i.e., co-vary), and is designated as sxy, the covariance of X and Y. For each observation, we subtract its Y value from the mean of Y, or Y-bar, and we subtract its X value from the mean of X, or X-bar. We multiply the two differences together, and sum them over all the observations; and then divide this sum by the number of observations, minus 1. = [Sum(Xi-X-bar)(Yi-Y-bar)] / (N-1) But all that the covariance statistic is really useful or valuable for is its indication of the sign (+ or -) of a relationship. It tells us nothing about the strength of a relationship. The covariance statistic produces a raw number which has no theoretical upper bound.
	Type I Error	rejecting the null hypothesis when in fact it is true.
	Type II Error	failing to reject H0 when it is in fact false
	Partial Correlation	correlation coeff holding other variables constant (doesn't do that in Pearsonian r)
	Interaction	When an X variable's effect depends on the values of other X variables. Most common approach for modeling interaction introduces cross-product terms of the explanatory variables into the multiple regression model. Ex: the more SES, the less the effect of life events on mental impairment.
	homo/hetero-scedasticity	The variance of the error term is constant for all values of Xi. This is the assumption of homoscedasticity. The word homoscedastic is from the Greek word skedastos, which means, able to be scattered, which itself is from skedannunai, to scatter. The word literally means having equal scatter or variation; having equal variances. The opposite of homoscedasticity (which is what we strive for) is heteroscedasticity (which is what we want to avoid). This is a serious and important assumption (regression). It assumes that the errors of prediction are not related to the values of the independent variable. Violating this assumption will not introduce bias into the OLS estimate of the slope, but will bias the estimate of the standard error of the slope.
	Resistant vs Robust Estimators	An estimate is resistant if its value is not much affected by small changes in sample data. A robust estimator performs well even when there are small violations of assumptions about the underlying population (an error term that is not really Gaussian).
	Probability	Probability is the likelihood that a given event will occur. In gambling, the term probability takes the form of a specific mathematical expression; it is the frequency of a given outcome divided by the total number of all possible outcomes.
	Odds	The likelihood of a given event occurring, compared to the likelihood of the same event not occurring.
	Ordinal Variable	variable that is categorical and ordered; "poor" "good" "excellent" very liberal, slightly liberal, moderate, etc.
	Nominal Variable	a variable that is categorical but not ordered
	Censoring	In event history analysis, exists when incomplete information is available about the duration of the risk period because of limited observation period A case is censored if-by the end of the observation period- the event has not occurred to the case (right censored) A subject can also be left censored if data is not available at the beginning of the risk period
	crude death rate	the total number of deaths per year per 1000 people; sum of the weighted ASDRs CDR = (Deaths/Midyear Population) *1000 Crude because not all population is at an equal risk of death- varies by many characteristics
	age-specific death rate (ASDR)	This refers to the total number of deaths per year per 1000 people of a given age Age-specific death rates, and not crude death rates, should be used to compare the mortality experiences of countries with known differences in age composition. Ex: U.S. & Venezuela The U.S. is an “older” country than Venezuela, that is, the U.S. has more older people proportionately than Venezuela. In contrast, Venezuela is a much “younger” country than the U.S. Because younger people die at lower rates than do older people, many (but not all) “young” countries have lower CDRs than “old” countries. nMx= (deaths to persons aged x to x+n/mid-year population aged x to x+n)*1000 -->where n is the width of the age group and x is the initial year of the age group Not a crude rate bc it takes into account differential mortality by age group
	Menarche	Refers to the age at which a female experiences her first menstrual cycle. Typically occurs in early teens, though sometimes younger due to hormones in food and improved nutrition
	age-specific fertility rate (ASFR)	The annual number of births to women in a particular age group per 1000 women in that age group Data obtained by civil registration system or censusesFormula:
	American Community Survey (ACS)	An annual survey conducted by the CB that has been instituted to replace the long form of the census Takes a representative sample of the American people on typical census topics: economics, housing, demographic and social variables Important because of the yearly update
	crude birth rate	CBR = (Births/Mid Year Population)*1000 Crude because the denominator includes the total population, not just the population at risk Problematic because age structure can have substantial effects on crude rates Ie: A developing country population with many young people can have a high CBR Need to use standardization techniques to refine comparisons
	Current Population Survey	A sample survey conducted monthly by the CB*Designed to represent the civilian non-institutionalized population that obtains a wide range of socioeconomic demographic data such as employment, unemployment, earnings, hours or work, and age, sex, race, occupation and industry
	Definitions of Death (underlying cause vs. pattern of failure)	Underlying cause- definition of death in a life table entails that every death is represented in just one d column so that the table is mutually exclusive and additive Pattern of failure- the number of persons who leave the population for each type of chronic disease includes everyone who had that disease listed on their death certificate
	Demographic transition	Population shift from high fertility and mortality to low fertility and mortality
	Diffusion Effect (in Fertility Transition)	Attempt to identify a mechanism that leads to the cumulative adoption of some behavior by more and more individuals even while their social position and the resources associated with them remain largely unchanged Fertility declines take place under a wide variety of economic and mortality conditions and there is a tendency to be influenced by ethnic, linguistic, and religious boundaries
	Fecundity	The physiological capacity of a woman, man, couple, or group to reproduce Infecund persons are also described as sterile Women are most fecund during their 20’s*For females, fecundity ranges from 0 to 30 children Bongaarts maximum fecundity is about 15 children per woman. This is the theoretical maximum if women engaged in natural fertility from age 12 to age 50
	Fertility	Actual birth performance One’s Fertility is limited to one’s fecundity and is usually far below it
	General Fertility Rate	Number of births in a given year divided by midyear female population of childbearing age x 1000 Improves on CBR because it only includes pop at risk Masks differences in rate of childbearing for different ages throughout the reproductive years
	Gross Reproduction Rate	The sum of the ASFRs that include only live female births in the numerators Used to determine whether the pop will grow, replace itself, or decline Formula: Interpretation: The number of daughters expected to be born alive to a hypothetical cohort of 1000 women GRR measures daughters per woman, TFR measures children
	Infant Mortality Rate	Number of deaths to children born alive from birth to exact age one year/ number of live births x 1000 Equal to NMR + PNMR Best known and most widely available measure of mortality in early life Key indicator of demographic development and health conditions in different countries Problems: Migration may affect the numerator but not the denominator and some deaths include children born at the end of last year (still under age 1)
	IPUMS	Integrated Public Use Microdata Series Consists of microdata samples from US and International census records Records are converted and made available to researchers through a web system Based out of Minnesota Population Center Provides consistent variable names, coding schemes, and documentation across all samples
	Life Expectancy	Average number of years remaining to a group of persons who reached a given age
	Life Span	Maximum age that humans as a species could reach under optimum conditions Longest was that of Jeanne Calment (122y 5m) from France and now is Tuti Yusupova of Uzbekistan who is 128 Almost entirely biological
	Life Table	A statistical model composed of a combination of age specific mortality rates for a given population Single decrement- has only one way of leaving- mortality Unabridged- mortality info for single years of life Most are abridged- information by age group
	Life Table Transient State vs. Absorbing State	Conventional life tables concern 2 states (life and death) and multiple decrement concern multiple states Multistate models allow for movements between life (active state) and death ( absorbing state) but also for possible movements among various types of active states Absorbing states only permit entries Transient states allow entries and exits
	Longevity	The ability to resist death Has both biological and social components and varies according to these characteristics
	Multiple Decrement Life Table	An extension of the standard life table, takes into account multiple transitions between states (Such as more than one cause of death or way to leave a population)
	National Survey of Family Growth	An ongoing series of sample surveys designed to provide current information about childbearing, contraception and related aspects of maternal and child health for the US Used by the US Dept of Health and Human Services to plan health services and health ed programs
	Natural Fertility	The level of fertility in a population in which deliberate control of childbearing is not practice Characteristic to most populations prior to the onset of the demographic transition Achieved by Hutterites
	Net Reproduction Rate	Average number of daughters born per woman (or 1000 women) by the end of her childbearing years who have been subject to the ASBRs and survival rates in a given year NRR = 1: Exact Replacement NRR < 1: Below Replacement NRR > 1: Above Replacement
	Replacement Fertility	Level of fertility needed for a population to replace itself Average of 2.1 babies per woman
	Reproduction	Production of female births
	Sex Ratio at Birth (SRB)	Usually defined as the number of boys born for every 100 girls SRB = (# male births/# female births) x 100 Most societies have SRBs between 104 and 106 China has a high (120) SRB
	Taeuber Paradox	Attributed to Conrad Taeuber by Keyfitz Essentially States that the elimination of a particular death risk (ie. Cancer) makes little impact on cohort life span because it in effect exposes the people that would have died to a whole new set of risks
	Total Fertility Rate (TFR)	Average number of children a hypothetical cohort of 1000 women would have if they survived their childbearing years
	Family Limitation & Birth Control	FL depends on the number of children already born and refers specifically to behavior designed to stop childbearing altogether. BIRTH CONTROL (BC) encompasses both behavior intended to stop births and deliberate attempts to space births, and may also apply to behavior outside of the family context, that is, in "illegitimate" relations. This distinction is theoretically important because it was the introduction and spread of stopping behavior (i.e., FL), and not spacing per se (which would involve BC), that was the key to the onset of the Demographic Transition (DT). In other words, the DT resulted from the shift from NF to FL.
	Singulate Mean Age at Marriage (SMAM)	Mean at first marriage for a cohort of women or men who marry by age 50 Computed by information on current marital status in a single census or survey Essentially the mean number of years lived in the single state as implied by a schedule of age specific percentages single Analogous to TFR bc of cross sectionality
	Stable Population	A model or hypothetical population closed to migration with an unchanging relative age composition and a constant rate of change in its total size Results from conditions of constant fertility and mortality rates over an extended period Contrasting age distributions will evolve to identical stable age distributions if the fertility and mortality levels are the same
	Stationary Population (LT interpretation)	In a life table, the nLx entry may be interpreted as either 1) the number of person years lived during the age interval x to x +n by the lx individuals alive at the beginning of the interval or 2) the stationary population within an interval- requires the assumption that 100,000 individuals are born each year After a significant number of years and no migration, a stationary population results After more than 100 years, the 100,000 entering the population each year at birth would be exactly balanced by 100,000 dying at all ages
	Spatial mobility (3 types)	1. local movement, i.e., short-distance change of residence within the same community; 2. internal migration, i.e., change of residence from one community to another while remaining within the same national boundaries; 3. international migration, i.e., change of residence from one nation to another. All spatial mobility involves "permanent" changes of residence. The term "migration" is usually reserved for those changes of residence that involve a complete change and readjustment of the community affiliations of the mover. Thus local movement (#1 above) should not be referred to as migration. A "residential move" is a permanent change of residence. A "migration" is a permanent change of residence involving the crossing of a political boundary.
	Net Migration	In-migration + Out-Migration in a given area over a given period of time Rate is net migration/mid year pop x k (k = 1000 or 100)
	Whipple’s Measure of Age Heaping	A popular way to determine if age heaping is having an effect Range from 0 (0 and 5 aren't reported at all) to 100 (no pref for 0 or 5) to 500 (Only ages ending in 0 or 5 are reported) Values less than 105 are considered accurate
	Age Heaping	The practice of reporting years of life so that the terminal digit reflects cultural preference or ease of reporting (such as 0 or 5)
	Parity	the number of live births born to a woman

Share This Flashcard Set

Set the Language

Quant

Add to Folders

Upgrade to Cram Premium

Related Essays

Card Range To Study

78 Cards in this Set