• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/60

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

60 Cards in this Set

  • Front
  • Back

Variability

Is a measure of the dispersion or spread of scores in a distribution and ranges from 0 to infinity. A measure of variability is a way to measure the dispersion or spread of scores around the mean.

What is the simplest way to describe how dispersed scores are?

It is to identify the range of scores in a distribution.

The range

Is the difference between the largest value and the smallest value in a data set.



Range= L-S

Why is the range the most informative for data sets without outliers?

Because the range can change dramatically with the presence of one outlier.

Fractiles

Are measures that divide data sets into two or more equal parts. The median is an example of a fractile, you also have quartiles (4), deciles(10) and percentiles(100).

What are the 4 quartiles and what do they do?

They split a data set into 4 equal parts. 25th percentile, 50th percentile, 75th percentile, 100th percentile.

The lower quartile

Is the median value of the lower half of a data set at the 25th percentile.

The median quartile

Is the median value of a data set at the 50th percentile of a distirbution.

The upper quartile

Is the median value of the upper half of the data set at te 75th percentile of a distribution.

Interquartile range (IQR)

Is the range of scores in a distribution between Q1 and Q3. Thus, the IQr is the range of scores, minus the top and the bottom 25% of scores, in a distribution.

Semi-interquartile range

Is used as a measure of half the distance between the upper and lower of a distributions, you can think of it as the mean IQR. So basically the IQR divided by 2.

Variance

Is a measure of variability for the average squared distance that scores deviate from their mean.

Population variance

Is a measure of variability for the average square distance that score in a population deviate from the mean. It is computed only when all scores ina given population are recorded. Population variance is stated as the symbol sigma squared. The formulation for it is in notion.

The sum of squares (SS)

Is the sum of the squared deviations of scores from tgeir mean.

Sample variance

Is a measure of variabilitt for the average squared difference that scores in a sample deviate from the mean. It is computed when only a portion or sample of data is measured in a population. The formula is in notion.

To compute variance, we square each deviation, why do we do this?

Basically, we need to compute a positive value for variance that is not zero. Think of any solution to avoid this zero result as intentionally making an error. To minimize error, we need to ensure that the result we obtain is the smallest possible positive value, or the value with minimal error. So basically, we do it because it provides a solution with minimal error. So squaring each deviation provides a solution for the minimal erro that we then correct by taking the square root of the variance.

The degrees of freedom for sample variance

Are the number of scores in a sample that are free to vary. All scores except one are free to vary in a sample.

The standard deviation (SD)

Also caleld the root mean square deviation, is a measure of variability for the average distance that scores deviate from their mean. It is calculated by taking the square root of the variance.

Standard deviation formula

write it down and check notion.

How do we find the Standard deviation?

We first compute the variance and then take the square root of that answer remember that our objective is to find the average distance that scores deviate from their mean and not the average squared distance that scores deviate from their mean. Taking the square root of the variance will allow us to reach this objective.

What does the Standard deviation tell us?

It is an estimate for the average distance that scores deviate from the mean. When scores are concentrated near the mean, the Sd is small; when scores are scattered far from the mean, the SD is larger.

Which 3 statements can be made for normal distributions with any mean and any variance?

1. at least 68% of all scores lie within 1 Sd of the mean.


2. At least 95% of all scores lie within 2 SD of the mean.


3. At least 99.7% of all scores lie within 3 SD of the mean.

The empirical rule

68% 1 SD of the mean, 95% 2 SD of the mean, 99.7% 3 SD of the mean.



The name of this rule arises because so many of the behaviours researchers observe are normally distributed. The rule is then an approximation of the fact that the percentages are correct.

Characteristics of the SD

1. The SD is always positive


2. The SD is used to describe quantitative data.


3. The SD is most informative when reported with the mean.


4. The value for the SD is affected by the value of each score in a distribution.

The correlational method

Is to treat each factor like a dependent variable and measure the relationship between each pair of variables.

Correlation coefficient

The statistics we use to measure correlations. The value of a correlation ranges from -1.0 to +1.0. Values closer to +1.0 indicates stronger correlations.

What can you use a correlation for?

1. To describe the pattern of data points for the values of two factors.


2. To determine whether the pattern observed in a sample is also present in the population from which the sample was selected.

A correlation

A statistical procedure used to describe the strength and direction of the linear relationship between two factors.

A scatterplot

Is a common way to illustrate a correlation, The data points are plotted along the x-and-y axis of the graph to see if a pattern emerges. The pattern that emerges is described by the value and sign of a correlation. A scatterplot is also called a scattergram.

A positive correlation

Means that as the value of one factor increase, the values of the second factor also increase; as the values of one factor decrease, the value of the other factor also decrease, if two factors have values that change in the same direction, we can graoh the correlation using a straight line. A perfect positive correlation occurs when each data point falls exactly on a straight line. It basically indicated that the values of two factors change in the same direction.

A negative correlation

Means that as the values of one factor increase, the values of the second factor decrease. If two factors have values that change in the opposite direction, we can graph the correlation using a straight line. It is a negative value of r that indicates that the values of two factors change in different directions, meaning that as the values of one factor increase the values of the second factor decrease.

A regression line

Is the best-fitting straight line to a set of data points. A best-fitting line is the line that minimizes the istance of all data points that fall from it. The closer a set of data points falls to a regression line, the stronger the correlation.

The Pearson correlation coefficient (R)

Also called Pearson product-moment correlation coefficient, is a measure of the direction and strength of the linear relationship of two factors in which the data for both factors are measured on an interval or ratio scale of measurement.

Covariance

Is the extent to which the values of two factors vary together. The closer data points fall to the regression line, the more that values of two factors vary together.

Name the 4 steps of hypothesis testing

Step 1: state the hypotheses, the null and the alternative hypothesis.


Step 2: set the criteria for a decision.


Step 3: compute the test statistic.


Step 4: make a decision.

What are the key 3 assumptions we make to test for significance of a linear correlation?

1. Homoscedasticity.


2. Linearity.


3. Normality.

Homoscedasticity

Is the assumption of constant variance among data points. We assume that there is an equal variance or scatter of data points dispersed aloing the regression line.

Linearity

Linearity is the assumption that the best way to describe a pattern of data is using a straight line. In truth, we could fit just about any set of data points to a best fitting straight line, but the data may actually conform better to other shapes, such as curvilinear shaper. Linearity is thus the assumption that the best way to describe a pattern of data is using a straight line.

Normality

To test for linear correlations, we must assume that the data points are normally distributed. For a linear correlations between two factors, the assumption of normality requires that a population of X and Y scores for two factors forms a bivariate normal distribution. Violating the assumption of normality can distort or bias the value of the correlation coefficient.

Reverse causality

Is a problem that arises when the causality between two factors can be in either direction.

Outliers

Can obscure the relationship between two factors by altering the direction adn the strength of an observed correlation. An outlier is a score that falls substantially above or below most other scores in a data set.

The restriction of range problem

This occurs when the range of data measure in a sample is restricted or smaller than the range of data in the general population. To avoid this problem, the direction and the strength of a significant correlation should only be generalized to a population within the limited range of measurements observed in the sample.

There are 4 measurements for ways in which we can describe the variability, what are these?

1. Range


2. Interquartile range


3. Variance


4. Standard deviation

Range

The easiest measurement, it is the range of scores that you get so the distance between the lowest and highest scores. Can only be calculated for variables that are at least ordinal because of the fact that a nominal variable does not have an order and you cannot calculate the rage.

Characteristics of the range

1. It is easy to understand.


2. It is informative because it tells us what the lowest and highest scores are.


3. It is very responsive to outliers . Because it simply looks at the lowest and highest scores. So it can be that you add one outlier that the range becomes much much higher then it was before.

Interquartile range

It has to do with the median, the thing is when you have the median you can also calculate the 25th and 75th percentile. It is useful because it tells you something about the middle group. You can visualize the IQR with a boxplot, with this you can show the range and the IQR. It can only be calculated for variables that are at least ordinal because of the fact that a nominal variable does not have an order and you cannot calculate an IQR because it does not have an order.

The formula for variance

Write it down.

Variance

Is a calculation that we use that is based on the sum of squares. It means that we are using the square distances from the mean. Variance is a little bit of a meaningless value, it becomes meaningless in comparison so if you are comparing different countries for example with each other then if you compare the variance it becomes meaningful. So then you can see that there is more variance arounm the mean. It allows for comparison.

How do you calculate a variance in a population?

1. Calculate all distances from the mean.


2. Square all these individual numbers.


3. You add these numbers together.


4. Divide it by the number of people on the population.

Standard deviation

This is the most complicated to calculate, it is a measure of variabilite. It is the average distance frome the mean. It is the square root of the variance. So basically if we have a mean in the distribution than the SD is how muhch people on avergae differ from the mean.

A normal distribution

Is a distribution that is symmetrical. Where one answers the most given answer, in these distribution the mean, median and mode are all the same. BUt skewed distribution, (ones that are not symmetrical) there is a difference between a mean, median and mode.

The mu sign

Is a population mean

sigma

Standard deviation in a population

M

Mean in a sample

s

SD in a sample

x

A specific score on a variable or a unit of analysis score on a variable.

Bivariate statistics

Statistics dealing with the correlation adn relation between two variables.

When we are building models based on theory we distinguish between two types of variables. What are these?

Independent variables and dependent variables.

Independent variables

The variables we expect to influence the other variables in the model. Referred to as X.

Dependent variables

The variables we expect to be influenced by at least one independent variable in the model. Referred to as Y.