• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/110

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

110 Cards in this Set

  • Front
  • Back

chi-square test for independence (definition)

Uses the frequency data from a sample to evaluate the relationship between two variables in the population. Each individual in the sample is classified on both of the two variables, creating a two-dimensional frequency-distribution matrix. The frequency distribution for the sample is then used to test hypothesis about the corresponding frequency distribution for the population

chi-square test for independence (null hypothesis)

- The two variables being measured are independent. For each individual the value obtained for one variable is not related to, or influenced by, the value for the second variable.



Version 1 > Ho: No relationship between two variables



H1: There is a relationship



Version 2 > Ho: The two distributions have the same shape, same proportions,



H1: Different proportions



*No relationship between two variables = distributions have equal proportions

Phi- Coefficient

Measures the strength of the relationship, rather than significance, and thus provides measure of effect size.


Chi- Square Test

Nonparametric techniques that test hypothesis about the form of the entire frequency distribution.


1. Goodness of Fit


2. Test for Independence



*positively skewed beginning at zero, exact shape is determined by df



*data consist of frequency or number of individuals located in each category



*Chi-Square statistic is distorted when fe values are very small. Chi-Square test should not be performed when the fe of any small is < 5.

Goodness of Fit Test

Compares the frequency distribution for a sample to the population distribution that is predicted by Ho.



-Determines how well the observed frequencies (sample data) fit the expected frequencies (hypothesis - predicted by Ho)



*fe= pn


*df= C-1

Test for Independence

*fe=fcfr/n


*df= (R-1)(C-1)



- A large chi-square value means there is a large discrepancy between fo and fe values, rejecting Ho and providing support for a relationship between the two variables (H1).



- Similar to both a correlation and an independent measures T test because it can be used to evaluate a relationship between variables or a difference between the populations



Parametric

-Numerical Scores



-Ratio, Interval



-Most likely to detect real differences/relationships



-Concerns parameters/ assumptions about parameters

Non-Parametric "distribution-free test"

-Classified Categorically (i.e Democrat or Rep)



-Nominal, Ordinal



- Non- numerical values



-These are simply frequencies

Large Variances

-Can greatly reduce the likelihood that these parametric test will find significance

Goodness of Fit (Ho, and H1)

m,m

the numerical value for a correlation

can never be greater than 1.00 and can never be less than -1.00

A large value for chi-square indicates

discrepancy between sample data and null hypothesis (NOT a good fit)

The F-ratio for Two-factor analysis includes/excludes:

Includes: MSA, MSB, MSA*B, MS within treatments



*The value for MS within treatments forms the denominators of ALL 3-F ratios



Excludes: MS between treatments

(One-Way Anova)


As differences between treatments increase

F-ratio increases

(One-Way Anova)


As variability within treatments increases

F-ration decreases

The larger the mean differences

the larger the F-ratio

Larger Sample Variance

Smaller F- ratio

(One-Way Anova)


What is a factor?

An independent ( or quasi independent) variable

As Sample size increases

standard error decreases



(standard dev decreases)

As correlation increases

Standard error of estimate becomes smaller

Analysis of Variance

is a hypothesis-testing procedure that is used to evaluate the means differences between two or more treatments (or populations)


Levels

The individual conditions or values that make up a factor

Statistical Hypothesis for Anova

Ho: u1= u2=u3


H1: There is at least one mean difference among the populations

Test Statistic for ANOVA

asas

Test-wise alpha

is the risk of Type I error, or alpha level for an individual hypothesis test


Between- Treatments Variance

*Numerator of F-ratio


-Provides a measure of overall differences between treatment conditions



- Measures the differences between sample means



Two possible explanations for differences:


1. Not caused by any treatment effect, naturally-random unsystematic occurs



2. Caused by treatment effects, systematic

Within- Treatment Variance

*denominator of F-ratio


-Variability within each sample



-Provides a measure of the variability inside each treatment condition



-Measures differences caused by random, unsystematic factors



-Provides a measure of how big the differences are when Ho is true

When Ho is true what happens to F ratio?

Expected F-ration = 1.00, top and bottom of the ratio (numerator + denominator) are both measuring the same variance

What happens to the value of F-ratio is differences between treatments are increased?

As differences between treatments increase, the F-ratio also increases

What happens to the F-ratio if variability inside treatments is increased?

As variability within treatments increase, the F ratio decreases

F RATIO =

VARIANCE BETWEEN TREATMENTS /


VARIANCE WITHIN TREATMENTS



*F= MS between / MS within

Within Treatments SS

SSwithin = Essinside each treatment

Between Treatment SS

SSbetween = SStotal-SSwithin

df TOTAL =

N-1

df Within =

N- k or Edf

df Between =

k-1

df total =

df within + df between

Variance

The mean of the squared deviations

If the Ho is false, F-ratio should be

Greater than 1.00

F-ratios *notes

-Computed from two variances (num/den) F values are always positive



-Variance is always positive

Very large df values =

Nearly all F-ratios are clustered near zero


Very small df values =

F distribution is more spread out

If F ratio is greater than 1.00

-We reject Ho

Steps for ANOVA hypothesis testing:

1: State Hypothesis and Alpha Level


(Ho: u1 = u2 = u3)


(H1: At least one of the treatment means is different)



Step 2: Critical Region :


*df total


*df between


*df within



Step 3: Compute F ration:


a. Analyze SS to obtain SS between and SS within


b. Using SS values and df values calculate MS between and MS within


c. Using both variances compute F ratio



Step 4: Decision


Effect size for ANOVA:

n^2 = % of variance accounted for =



SSbetweentreatments /


SS total

Ms between (numerator of F ratio) measures

How much difference exists between the treatment means

The bigger the mean differences between treatments

The bigger the F-ratio

MS within (denominator of F ratio)

Measure the variance of scores inside each treatment (variance of separate samples


Larger the sample variance within each treatment

Produces a smaller F-ratio

Increasing Sample Size tends to...

Increase the likelihood of rejecting Ho

Changes in sample size have...

little or no effect on measures of effect size (i.e % of variance accounted for)

Post Hoc Test

Are additional hypothesis test that are done after an ANOVA to determine exactly which mean differences are significant and which are not



Conditions:


1. Reject Ho


2. There are three or more treatments

Tukey's Honestly Significant Test (HSD)

Allows you to compute a single value that determines the minimum difference between treatment means that is necessary for significance



-Then HSD is used to compare two treatments


-If mean differences exceed Tukey's HSD, then their is significant difference between the treatments

Scheffe Test

One of the safest post hoc test = smallest risk of type I error

Sample Variability increases, estimated standard of error...

estimated standard of error increases

Nominal

NAMES



-does not indict direction or size of difference , simply just allows us to determine if two individuals are different



(gender, race etc...)

Ordinal

Ordered Sequence



-1st, 2nd, 3rd


-Can determine the direction of the difference but not the size



(small, medium, large etc...)


Interval

series of ordered categories; from a series of intervals that are exactly the same size



-Has arbitrary zero (i.e temperture)

Ratio

is an interval scale but has absolute zero


-ratios of numbers do not reflect magnitude


i.e weight

Positively Skewed

Tail on right, body left

Negatively Skewed

Tail on left, body on right

Discrete Variable

indivisible categories

Continuous Variable

consist of categories that are infinitely divisible

Sampling Error

The natural differences that exist between statistics and their parameters

For any set of data, the sum of the deviates

Will always equal 0

For extremely skewed distribution scores, the best measure of central tendency is

MEDIAN


What is the general relationship between standard deviation and variance?

Standard Deviation is the square root of variance

Locations near the population mean will have z scores equal to...

0


Adding a constant amount to every score in the population....

CHANGES THE VALUE OF THE MEAN

Type I error

Rejects a true Ho


-Rejects Ho, when should accept Ho concludes their is an effect when there is none

Probability

number of outcomes classified/ total number of possible outcomes


Random Sample

requires that each individual in the population has an equal chance of being selected

Independent Random Sample

requires that each individual has an equal chance of being selected and that the probability of being selected stays constant from one selection to the next if more than one individual is selected

Random Sampling

requires sampling with replacement



*Every individual has an equal chance of being selected


*probabilities must stay constant when more than on individual is being selected, this is sampling with replacement

Percentile rank

is the percentage of individuals with scores at or below a particular X value



-always corresponds to the proportion to the left of the score in question

Percentile

an X value identified by its rank

sampling error

the natural discrepancy, or amount of error, between a sample statistics and its corresponding population parameter

Sampling Distribution

a distribution of statistics obtained by selecting all of the possible samples of a specific size from a population

General Characteristics of Distribution

1. Sample means should pile up around the population mean


2. Pile of sample means should from a normal-shaped distribution (n>30)



3. The larger the sample size, the closer the sample means should be to the population mean


(large sample is better representative than smaller sample)

Standard Deviation

1. Describes the distribution by telling whether the individual scores are clustered close together or scattered over a wide range



2. Measures how well any individual score represents the population by providing a measure of how much distance is reasonable to expect between a score and a population mean


Standard error of M

Standard Deviation Symbol (M)




1. Stand error describes the distribution of sample means. It provides measure of how much distance is to be expected from one sample to another



2. Measures how well an individual sample


represents the entire distribution. How much distance is reasonable to expect between a sample mean and the overall mean for the distribution of sample means.

When standard error is small, then sample means are...

closer together and have similar values

When standard error is large, then sample means are...

scattered over a wide range and there are big differences from one sample to another

Standard Error of M

The standard deviation of the distribution of sample means.



Provides a measure of how much distance to expect on average between sample mean (M) and population mean (u)



Standard error is always less than or equal to standard deviation



Standard error tells us how much error to expect if you are using a sample mean to represent a population mean

As sample size increases, the size of the standard error

decreases

When the sample consist of a single score (n=1) the standard error

is the same as the standard deviation

Large the sample size....

*The more accurately is represents population



*the smaller standard error



*the larger the z-score



As sample size increases what happens to the expected value?

The expected value does not depend on sample size

Alpha level or Level of significance

is a probability value that is used to define the concept of "very unlikely" in a hypothesis test

Critical Region

is composed of extreme sample values that are very unlikely to be obtained if the null hypothesis is true


-If sample data falls in critical region = Reject Ho



-Boundaries of Critical Region are determined by alpha levels

A Z score near 0 or near zero indicates...

indicates that the data supports the null hypothesis (Ho)

Type I error

occurs when the researcher rejects the null hypothesis when it is actually true



(researcher concludes that treatment does have an effect (H1) when is actually has NO effect



*more serious

Alpha Level for a hypothesis test...

is the probability of type I error

Type II error

occurs when a researcher fails to reject a null hypothesis that is really false



(researcher retains Ho, however a true treatment effect exist)



*likely to occur when treatment effect is very small*

Type I error

Rejects true Ho



*MORE SERIOUS*

Type II error

Failure to reject false Ho


A smaller standard deviation produces...

a smaller standard error, which lead to a larger Z score

Larger n = Smaller standard error = bigger Z score

Smaller standard deviation = smaller standard error = larger z score = the more likely you are to reject Ho

If sample data rejects Ho for one-tailed test will the same data also reject Ho for two tailed test?

No - two-tailed test requires a larger mean difference, it is possible for a sample to be significant for a one-tailed test but not be significant for a two-tailed test

In most research situations the goal is

to determine whether a particular treatment has an effect on a population (reject Ho)

Effect Size and Cohens d

is intended to provide a measurement of the absolute magnitude of a treatment effect, independent of size of the sample(s) being used



-cohen's d = mean difference / standard dev.



Cohens d: *Provides an indiction of the strength or magnitude of a treatment effect

Power

is the probability that the test will correctly reject a false null hypothesis



-is the probability that the test will identify a treatment effect, if one exist



*Provides an indiction of the strength or magnitude of a treatment effect


Factors that influence Power:

1. Sample size: Increasing sample size increases power


Large sample = more power than small sample


2. Alpha Level: Increasing alpha level increases power



3. One tailed vs. Two tailed: One tailed has more power

As the effect size increases, the probability of rejecting Ho

increases = power of test increases



size of treatment effect increase, power increases

Effect Size increases = probability of rejecting Ho increases = power of test increases

Increasing sample size = increases power of test

As power increases, what happens to the probability of type II error?

As power increases, the probability of TYPE II error decreases

Difference between Z and T statistic

- Z : we KNOW population mean and standard deviation, uses standard error



-T: Population standard deviation and variance is UNKNOWN, uses estimated standard error (sm)

Estimated Standard Error

Sm



How much difference is reasonable to expect between sample mean (M) and population mean (u)



*As sample size increases the estimated Standard error tends to decrease

Estimated Standard Error

-T statistic, computed from sample data rather than actual population parameter (value of population standard deviation is unknown)

Distribution of t- statistics is...

More flatter and spread out than the standard normal distribution